Facebook
Twitter[1] The Progress by Population Group analysis is a component of the Healthy People 2020 (HP2020) Final Review. The analysis included subsets of the 1,111 measurable HP2020 objectives that have data available for any of six broad population characteristics: sex, race and ethnicity, educational attainment, family income, disability status, and geographic location. Progress toward meeting HP2020 targets is presented for up to 24 population groups within these characteristics, based on objective data aggregated across HP2020 topic areas. The Progress by Population Group data are also available at the individual objective level in the downloadable data set. [2] The final value was generally based on data available on the HP2020 website as of January 2020. For objectives that are continuing into HP2030, more recent data will be included on the HP2030 website as it becomes available: https://health.gov/healthypeople. [3] For more information on the HP2020 methodology for measuring progress toward target attainment and the elimination of health disparities, see: Healthy People Statistical Notes, no 27; available from: https://www.cdc.gov/nchs/data/statnt/statnt27.pdf. [4] Status for objectives included in the HP2020 Progress by Population Group analysis was determined using the baseline, final, and target value. The progress status categories used in HP2020 were: a. Target met or exceeded—One of the following applies: (i) At baseline, the target was not met or exceeded, and the most recent value was equal to or exceeded the target (the percentage of targeted change achieved was equal to or greater than 100%); (ii) The baseline and most recent values were equal to or exceeded the target (the percentage of targeted change achieved was not assessed). b. Improved—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved 10% or more of the targeted change. c. Little or no detectable change—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was not statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved less than 10% of the targeted change; (iii) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was not statistically significant; (iv) Movement was away from the baseline and target, standard errors were not available, and the objective had moved less than 10% relative to the baseline; (v) No change was observed between the baseline and the final data point. d. Got worse—One of the following applies: (i) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was statistically significant; (ii) Movement was away from the baseline and target, standard errors were not available, and the objective had moved 10% or more relative to the baseline. NOTE: Measurable objectives had baseline data. SOURCE: National Center for Health Statistics, Healthy People 2020 Progress by Population Group database.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
The IRS publishes migration data for the US population based upon the individual tax returns filed with the IRS, where they track on a year-by-year basis
The raw data published on the IRS website clearly shows patterns of evolution - changing patterns of what is recorded, how it is record, and naming conventions used - making it a challenge to track changes in the underlying data over time. The current dataset attempts to address these shortcomings by normalizing the record layout, standardizing the conventions, and collecting the annual into a single, coherent dataset.
An individual record is laid out with 9 fields
Y1 Y1_STATE_FIPS Y1_STATE_ABBR Y1_STATE_NAME Y2 Y2_STATE_FIPS Y2_STATE_ABBR Y2_STATE_NAME NUM_RETURNS NUM_EXEMPTIONS AGI Here, Y1 refers to the first year (from where the people are migrating) while Y2 refers to the second year (to where the people are migrating). As this is annual data, Y2 should always be the next year after Y1. Associated with each year are three different ways of identifying a state - the name of the state, it's two-letter abbreviaion, and it's FIPS code. Granted, carrying around three IDs per state is redundant; however, the various IDs are useful in different contexts. One thing to note - the IRS data represents migration into and out of the country via the introduction of a fake state, identified by STATE_NAME=FOREIGN, STATE_ABBR=FR, and STATE_FIPS=57.
From any given state, the dataset records migration to 52 destinations
Similarly, the dataset represents the migation into any given state as being from one of 52 destinations. Typically, the numbers associated with "staying put" constitute, by far, the largest contingent of tax payers for the given state. The one exception to this description is the FOREIGN state. The dataset does not record "staying put" outside of the country; there is no record for FOREIGN-to-FOREIGN migration. As such, there are 51, not 52, destinations paired with migration to-and-from the FOREIGN state.
Facebook
TwitterAnnual number of interprovincial migrants by province of origin and destination, Canada, provinces and territories.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
On 6/28/2023, data on cases by vaccination status will be archived and will no longer update.
A. SUMMARY This dataset represents San Francisco COVID-19 positive confirmed cases by vaccination status over time, starting January 1, 2021. Cases are included on the date the positive test was collected (the specimen collection date). Cases are counted in three categories: (1) all cases; (2) unvaccinated cases; and (3) completed primary series cases.
All cases: Includes cases among all San Francisco residents regardless of vaccination status.
Unvaccinated cases: Cases are considered unvaccinated if their positive COVID-19 test was before receiving any vaccine. Cases that are not matched to a COVID-19 vaccination record are considered unvaccinated.
Completed primary series cases: Cases are considered completed primary series if their positive COVID-19 test was 14 days or more after they received their 2nd dose in a 2-dose COVID-19 series or the single dose of a 1-dose vaccine. These are also called “breakthrough cases.”
On September 12, 2021, a new case definition of COVID-19 was introduced that includes criteria for enumerating new infections after previous probable or confirmed infections (also known as reinfections). A reinfection is defined as a confirmed positive PCR lab test more than 90 days after a positive PCR or antigen test. The first reinfection case was identified on December 7, 2021.
Data is lagged by eight days, meaning the most recent specimen collection date included is eight days prior to today. All data updates daily as more information becomes available.
B. HOW THE DATASET IS CREATED Case information is based on confirmed positive laboratory tests reported to the City. The City then completes quality assurance and other data verification processes. Vaccination data comes from the California Immunization Registry (CAIR2). The California Department of Public Health runs CAIR2. Individual-level case and vaccination data are matched to identify cases by vaccination status in this dataset. Case records are matched to vaccine records using first name, last name, date of birth, phone number, and email address.
We include vaccination records from all nine Bay Area counties in order to improve matching rates. This allows us to identify breakthrough cases among people who moved to the City from other Bay Area counties after completing their vaccine series. Only cases among San Francisco residents are included.
C. UPDATE PROCESS Updates automatically at 08:00 AM Pacific Time each day.
D. HOW TO USE THIS DATASET Total San Francisco population estimates can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS). To identify total San Francisco population estimates, filter the view on “demographic_category_label” = “all ages”.
Population estimates by vaccination status are derived from our publicly reported vaccination counts, which can be found at COVID-19 Vaccinations Given to SF Residents Over Time.
The dataset includes new cases, 7-day average new cases, new case rates, 7-day average new case rates, percent of total cases, and 7-day average percent of total cases for each vaccination category.
New cases are the count of cases where the positive tests were collected on that specific specimen collection date. The 7-day rolling average shows the trend in new cases. The rolling average is calculated by averaging the new cases for a particular day with the prior 6 days.
New case rates are the count of new cases per 100,000 residents in each vaccination status group. The 7-day rolling average shows the trend in case rates. The rolling average is calculated by averaging the case rate for a particular day with the prior six days. Percent of total new cases shows the percent of all cases on each day that were among a particular vaccination status.
Here is more information on how each case rate is calculated:
The case rate for all cases is equal to the number of new cases among all residents divided by the estimated total resident population.
Unvaccinated case rates are equal to the number of new cases among unvaccinated residents divided by the estimated number of unvaccinated residents. The estimated number of unvaccinated residents is calculated by subtracting the number of residents that have received at least one dose of a vaccine from the total estimated resident population.
Completed primary series case rates are equal to the number of new cases among completed primary series residents divided by the estimated number of completed primary series residents. The estimated number of completed primary series residents is calculated by taking the number of residents who have completed their primary series over time and adding a 14-day delay to the “date_administered” column, to align with the definition of “Completed primary series cases” above.
E. CHANGE LOG
Facebook
TwitterBackground
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.
Longitudinal data
The LFS retains each sample household for five consecutive quarters, with a fifth of the sample replaced each quarter. The main survey was designed to produce cross-sectional data, but the data on each individual have now been linked together to provide longitudinal information. The longitudinal data comprise two types of linked datasets, created using the weighting method to adjust for non-response bias. The two-quarter datasets link data from two consecutive waves, while the five-quarter datasets link across a whole year (for example January 2010 to March 2011 inclusive) and contain data from all five waves. A full series of longitudinal data has been produced, going back to winter 1992. Linking together records to create a longitudinal dimension can, for example, provide information on gross flows over time between different labour force categories (employed, unemployed and economically inactive). This will provide detail about people who have moved between the categories. Also, longitudinal information is useful in monitoring the effects of government policies and can be used to follow the subsequent activities and circumstances of people affected by specific policy initiatives, and to compare them with other groups in the population. There are however methodological problems which could distort the data resulting from this longitudinal linking. The ONS continues to research these issues and advises that the presentation of results should be carefully considered, and warnings should be included with outputs where necessary.
New reweighting policy
Following the new reweighting policy ONS has reviewed the latest population estimates made available during 2019 and have decided not to carry out a 2019 LFS and APS reweighting exercise. Therefore, the next reweighting exercise will take place in 2020. These will incorporate the 2019 Sub-National Population Projection data (published in May 2020) and 2019 Mid-Year Estimates (published in June 2020). It is expected that reweighted Labour Market aggregates and microdata will be published towards the end of 2020/early 2021.
LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each user guide volume alongside the appropriate questionnaire for the year concerned. However, volumes are updated periodically by ONS, so users are advised to check the latest documents on the ONS Labour Force Survey - User Guidance pages before commencing analysis. This is especially important for users of older QLFS studies, where information and guidance in the user guide documents may have changed over time.
Additional data derived from the QLFS
The Archive also holds further QLFS series: End User Licence (EUL) quarterly data; Secure Access datasets; household datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.
Variables DISEA and LNGLST
Dataset A08 (Labour market status of disabled people) which ONS suspended due to an apparent discontinuity between April to June 2017 and July to September 2017 is now available. As a result of this apparent discontinuity and the inconclusive investigations at this stage, comparisons should be made with caution between April to June 2017 and subsequent time periods. However users should note that the estimates are not seasonally adjusted, so some of the change between quarters could be due to seasonality. Further recommendations on historical comparisons of the estimates will be given in November 2018 when ONS are due to publish estimates for July to September 2018.
An article explaining the quality assurance investigations that have been conducted so far is available on the ONS Methodology webpage. For any queries about Dataset A08 please email Labour.Market@ons.gov.uk.
Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
2022 Weighting
The population totals used for the latest LFS estimates use projected growth rates from Real Time Information (RTI) data for UK, EU and non-EU populations based on 2021 patterns. The total population used for the LFS therefore does not take into account any changes in migration, birth rates, death rates, and so on since June 2021, and hence levels estimates may be under- or over-estimating the true values and should be used with caution. Estimates of rates will, however, be robust.
Facebook
TwitterDue to changes in the collection and availability of data on COVID-19, this website will no longer be updated. The webpage will no longer be available as of 11 May 2023. On-going, reliable sources of data for COVID-19 are available via the COVID-19 dashboard and the UKHSA GLA Covid-19 Mobility Report Since March 2020, London has seen many different levels of restrictions - including three separate lockdowns and many other tiers/levels of restrictions, as well as easing of restrictions and even measures to actively encourage people to go to work, their high streets and local restaurants. This reports gathers data from a number of sources, including google, apple, citymapper, purple wifi and opentable to assess the extent to which these levels of restrictions have translated to a reductions in Londoners' movements. The data behind the charts below come from different sources. None of these data represent a direct measure of how well people are adhering to the lockdown rules - nor do they provide an exhaustive data set. Rather, they are measures of different aspects of mobility, which together, offer an overall impression of how people Londoners are moving around the capital. The information is broken down by use of public transport, pedestrian activity, retail and leisure, and homeworking. Public Transport For the transport measures, we have included data from google, Apple, CityMapper and Transport for London. They measure different aspects of public transport usage - depending on the data source. Each of the lines in the chart below represents a percentage of a pre-pandemic baseline. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Citymapper Citymapper mobility index 2021-09-05 Compares trips planned and trips taken within its app to a baseline of the four weeks from 6 Jan 2020 7.9% 28% 19% Google Google Mobility Report 2022-10-15 Location data shared by users of Android smartphones, compared time and duration of visits to locations to the median values on the same day of the week in the five weeks from 3 Jan 2020 20.4% 40% 27% TfL Bus Transport for London 2022-10-30 Bus journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 34% 24% TfL Tube Transport for London 2022-10-30 Tube journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 30% 21% Pedestrian activity With the data we currently have it's harder to estimate pedestrian activity and high street busyness. A few indicators can give us information on how people are making trips out of the house: activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Walking Apple Mobility Index 2021-11-09 estimates the frequency of trips made on foot compared to baselie of 13 Jan '20 22% 47% 36% Parks Google Mobility Report 2022-10-15 Frequency of trips to parks. Changes in the weather mean this varies a lot. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail & Rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail and recreation In this section, we focus on estimated footfall to shops, restaurants, cafes, shopping centres and so on. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Grocery/pharmacy Google Mobility Report 2022-10-15 Estimates frequency of trips to grovery shops and pharmacies. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Retail/rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Restaurants OpenTable State of the Industry 2022-02-19 London restaurant bookings made through OpenTable 0% 0.17% 0.024% Home Working The Google Mobility Report estimates changes in how many people are staying at home and going to places of work compared to normal. It's difficult to translate this into exact percentages of the population, but changes back towards ‘normal' can be seen to start before any lockdown restrictions were lifted. This value gives a seven day rolling (mean) average to avoid it being distorted by weekends and bank holidays. name Source Latest Baseline Min/max value in Lockdown 1 Min/max value in Lockdown 2 Min/max value in Lockdown 3 Residential Google Mobility Report 2022-10-15 Estimates changes in how many people are staying at home for work. Compared to baseline of 5 weeks from 3 Jan '20 131% 119% 125% Workplaces Google Mobility Report 2022-10-15 Estimates changes in how many people are going to places of work. Compared to baseline of 5 weeks from 3 Jan '20 24% 54% 40% Restriction Date end_date Average Citymapper Average homeworking Work from home advised 17 Mar '20 21 Mar '20 57% 118% Schools, pubs closed 21 Mar '20 24 Mar '20 34% 119% UK enters first lockdown 24 Mar '20 10 May '20 10% 130% Some workers encouraged to return to work 10 May '20 01 Jun '20 15% 125% Schools open, small groups outside 01 Jun '20 15 Jun '20 19% 122% Non-essential businesses re-open 15 Jun '20 04 Jul '20 24% 120% Hospitality reopens 04 Jul '20 03 Aug '20 34% 115% Eat out to help out scheme begins 03 Aug '20 08 Sep '20 44% 113% Rule of 6 08 Sep '20 24 Sep '20 53% 111% 10pm Curfew 24 Sep '20 15 Oct '20 51% 112% Tier 2 (High alert) 15 Oct '20 05 Nov '20 49% 113% Second Lockdown 05 Nov '20 02 Dec '20 31% 118% Tier 2 (High alert) 02 Dec '20 19 Dec '20 45% 115% Tier 4 (Stay at home advised) 19 Dec '20 05 Jan '21 22% 124% Third Lockdown 05 Jan '21 08 Mar '21 22% 122% Roadmap 1 08 Mar '21 29 Mar '21 29% 118% Roadmap 2 29 Mar '21 12 Apr '21 36% 117% Roadmap 3 12 Apr '21 17 May '21 51% 113% Roadmap out of lockdown: Step 3 17 May '21 19 Jul '21 65% 109% Roadmap out of lockdown: Step 4 19 Jul '21 07 Nov '22 68% 107%
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Illinois tax structure and the pension funds have caused many people to worry. The weather does not help either. So this is an effort to find the trend. Looking forward to the 2010-2020 data set, esp. with all the new startup energy in Chicago.
Data was downloaded from https://catalog.data.gov
Is there a data set somewhere that has the migration path of people moving out of IL.
Facebook
TwitterAbstract copyright UK Data Service and data collection copyright owner.
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.
Longitudinal data
The LFS retains each sample household for five consecutive quarters, with a fifth of the sample replaced each quarter. The main survey was designed to produce cross-sectional data, but the data on each individual have now been linked together to provide longitudinal information. The longitudinal data comprise two types of linked datasets, created using the weighting method to adjust for non-response bias. The two-quarter datasets link data from two consecutive waves, while the five-quarter datasets link across a whole year (for example January 2010 to March 2011 inclusive) and contain data from all five waves. A full series of longitudinal data has been produced, going back to winter 1992. Linking together records to create a longitudinal dimension can, for example, provide information on gross flows over time between different labour force categories (employed, unemployed and economically inactive). This will provide detail about people who have moved between the categories. Also, longitudinal information is useful in monitoring the effects of government policies and can be used to follow the subsequent activities and circumstances of people affected by specific policy initiatives, and to compare them with other groups in the population. There are however methodological problems which could distort the data resulting from this longitudinal linking. The ONS continues to research these issues and advises that the presentation of results should be carefully considered, and warnings should be included with outputs where necessary.
New reweighting policy
Following the new reweighting policy ONS has reviewed the latest population estimates made available during 2019 and have decided not to carry out a 2019 LFS and APS reweighting exercise. Therefore, the next reweighting exercise will take place in 2020. These will incorporate the 2019 Sub-National Population Projection data (published in May 2020) and 2019 Mid-Year Estimates (published in June 2020). It is expected that reweighted Labour Market aggregates and microdata will be published towards the end of 2020/early 2021.
LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each user guide volume alongside the appropriate questionnaire for the year concerned. However, volumes are updated periodically by ONS, so users are advised to check the latest documents on the ONS Labour Force Survey - User Guidance pages before commencing analysis. This is especially important for users of older QLFS studies, where information and guidance in the user guide documents may have changed over time.
Additional data derived from the QLFS
The Archive also holds further QLFS series: End User Licence (EUL) quarterly data; Secure Access datasets; household datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.
Variables DISEA and LNGLST
Dataset A08 (Labour market status of disabled people) which ONS suspended due to an apparent discontinuity between April to June 2017 and July to September 2017 is now available. As a result of this apparent discontinuity and the inconclusive...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set is supplement to this Scientific Reports article.
The data set provides estimates of country-level daily mobility metrics (uncertainty included) for 17 countries from March 11, 2020 to present. Estimates are based on more than 3.8 million smartphone trajectories.
Metrics:
Estimated daily average travelled distance by people.
Estimated percentage of people who did not move during the 24 hours of the day.
Countries: Argentina (ARG), Chile (CHL), Colombia (COL), Costa Rica (CRI), Ecuador (ECU), Greece (GRC), Guatemala (GTM), Italy (ITA), Mexico (MEX), Nicaragua (NIC), Panama (PAN), Peru (PER), Philippines (PHL), Slovenia (SVN), Turkey (TUR), United States (USA) and Venezuela (VEN).
Covered period: from March 11, 2020 to present.
Temporal resolution: daily.
Temporal smoothing:
No smoothing.
7-day moving average.
14-day moving average.
21-day moving average.
28-day moving average.
Uncertainty: 95% bootstrap confidence interval.
Data ownership
Anonymized data on smartphone trajectories are collected, owned and managed by Futura Innovation SRL. Smartphone trajectories are stored and analyzed on servers owned by Futura Innovation SRL and not shared with third parties, including the author of this repository and his organization (University of Bergamo).
Contribution
Ilaria Cremonesi of Futura Innovation SRL is the data owner and data manager.
Francesco Finazzi of University of Bergamo developed the statistical methodology for the data analysis and the algorithms implemented on Futura Innovation SRL servers.
Repository update
CSV files of this repository are regularly produced by Futura Innovation SRL and published by the repository's author after validation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MGD: Music Genre Dataset
Over recent years, the world has seen a dramatic change in the way people consume music, moving from physical records to streaming services. Since 2017, such services have become the main source of revenue within the global recorded music market. Therefore, this dataset is built by using data from Spotify. It provides a weekly chart of the 200 most streamed songs for each country and territory it is present, as well as an aggregated global chart.
Considering that countries behave differently when it comes to musical tastes, we use chart data from global and regional markets from January 2017 to December 2019, considering eight of the top 10 music markets according to IFPI: United States (1st), Japan (2nd), United Kingdom (3rd), Germany (4th), France (5th), Canada (8th), Australia (9th), and Brazil (10th).
We also provide information about the hit songs and artists present in the charts, such as all collaborating artists within a song (since the charts only provide the main ones) and their respective genres, which is the core of this work. MGD also provides data about musical collaboration, as we build collaboration networks based on artist partnerships in hit songs. Therefore, this dataset contains:
Genre Networks: Success-based genre collaboration networks
Genre Mapping: Genre mapping from Spotify genres to super-genres
Artist Networks: Success-based artist collaboration networks
Artists: Some artist data
Hit Songs: Hit Song data and features
Charts: Enhanced data from Spotify Weekly Top 200 Charts
This dataset was originally built for a conference paper at ISMIR 2020. If you make use of the dataset, please also cite the following paper:
Gabriel P. Oliveira, Mariana O. Silva, Danilo B. Seufitelli, Anisio Lacerda, and Mirella M. Moro. Detecting Collaboration Profiles in Success-based Music Genre Networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020), 2020.
@inproceedings{ismir/OliveiraSSLM20, title = {Detecting Collaboration Profiles in Success-based Music Genre Networks}, author = {Gabriel P. Oliveira and Mariana O. Silva and Danilo B. Seufitelli and Anisio Lacerda and Mirella M. Moro}, booktitle = {21st International Society for Music Information Retrieval Conference} pages = {726--732}, year = {2020} }
Facebook
TwitterDESCRIPTION
The TAU Spatial Room Impulse Response Database (TAU-SRIR DB) database contains spatial room impulse responses (SRIRs) captured in various spaces of Tampere University (TAU), Finland, for a fixed receiver position and multiple source positions per room, along with separate recordings of spatial ambient noise captured at the same recording point. The dataset is intended for emulation of spatial multichannel recordings for evaluation and/or training of multichannel processing algorithms in realistic reverberant conditions and over multiple rooms. The major distinct properties of the database compared to other databases of room impulse responses are:
Capturing in a high resolution multichannel format (32 channels) from which multiple more limited application-specific formats can be derived (e.g. tetrahedral array, circular array, first-order Ambisonics, higher-order Ambisonics, binaural).
Extraction of densely spaced SRIRs along measurement trajectories, allowing emulation of moving source scenarios.
Multiple source distances, azimuths, and elevations from the receiver per room, allowing emulation of complex configurations for multi-source methods.
Multiple rooms, allowing evaluation of methods at various acoustic conditions, and training of methods with the aim of generalization on different rooms.
The RIRs were collected by staff of TAU between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.
NOTE: This database is a work-in-progress. We intend to publish additional rooms, additional formats, and potentially higher-fidelity versions of the captured responses in the near future, as new versions of the database in this repository.
REPORT AND REFERENCE
A compact description of the dataset, recording setup, recording procedure, and extraction can be found in:
Politis., Archontis, Adavanne, Sharath, & Virtanen, Tuomas (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan.
available here. A more detailed report specifically focusing on the dataset collection and properties will follow.
AIM
The dataset can be used for generating multichannel or monophonic mixtures for testing or training of methods under realistic reverberation conditions, related to e.g. multichannel speech enhancement, acoustic scene analysis, and machine listening, among others. It is especially suitable for the follow application scenarios:
monophonic and multichannal reverberant single- or multi-source speech in multi-room reverberant conditions
monophonic and multichannel polyphonic sound events in multi-room reverberant conditions
single-source and multi-source localization in multi-room reverberant conditions, in static or dynamic scenarios
single-source and multi-source tracking in multi-room reverberant conditions, in static or dynamic scenarios
sound event localization and detection in multi-room reverberant conditions, in static or dynamic scenarios
SPECIFICATIONS
The SRIRs were captured using an Eigenmike spherical microphone array. A Genelec G Three loudspeaker was used to playback a maximum length sequence (MLS) around the Eigenmike. The SRIRs were obtained in the STFT domain using a least-squares regression between the known measurement signal (MLS) and far-field recording independently at each frequency. In this version of the dataset the SRIRs and ambient noise are downsampled to 24kHz for compactness.
The currently published SRIR set was recorded at nine different indoor locations inside the Tampere University campus at Hervanta, Finland. Additionally, 30 minutes of ambient noise recordings were collected at the same locations with the IR recording setup unchanged. SRIR directions and distances differ with the room. Possible azimuths span the whole range of $\phi\in[-180,180)$, while the elevations span approximately a range between $\theta\in[-45,45]$ degrees. The currently shared measured spaces are as follows:
Large open space in underground bomb shelter, with plastic-coated floor and rock walls. Ventilation noise. Circular source trajectory.
Large open gym space. Ambience of people using weights and gym equipment in adjacent rooms. Circular source trajectory.
Small classroom (PB132) with group work tables and carpet flooring. Ventilation noise. Circular source trajectory.
Meeting room (PC226) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.
Lecture hall (SA203) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.
Small classroom (SC203) with group work tables and carpet flooring. Ventilation noise. Linear source trajectory.
Large classroom (SE203) with hard floor and rows of desks. Ventilation noise. Linear source trajectory.
Lecture hall (TB103) with inclined floor and rows of desks. Ventilation noise. Linear source trajectory.
Meeting room (TC352) with hard floor and partially glass walls. Ventilation noise. Circular source trajectory.
The measurement trajectories were organised in groups, with each group being specified by a circular or linear trace at the floor at a certain distance from the z-axis of the microphone. For circular trajectories two ranges were measured, a close and a far one, except room TC352, where the same range was measured twice, but with different furniture configuration and open or closed doors. For linear trajectories also two ranges were measured, close and far, but with linear paths at either side of the array, resulting in 4 unique trajectory groups, with the exception of room SA203 where 3 ranges were measured resulting on 6 trajectory groups. Linear trajectory groups are always parallel to each other, in the same room.
Each trajectory group had multiple measurement trajectories, following the same floor path, but with the source at different heights.
The SRIRs are extracted from the noise recordings of the slowly moving source across those trajectories, at an angular spacing of approximately every 1 degree from the microphone. Instead of extracting SRIRs at equally spaced points along the path (e.g. every 20cm), this extraction scheme was found more practical for synthesis purposes, making emulation of moving sources at an approximately constant angular speed easier.
More details on the trajectory geometries can be found in the README file and the measinfo.mat file.
RECORDING FORMATS
As with the DCASE2019-2021 datasets, currently the database is provided in two formats, first-order Ambisonics, and a tetrahedral microphone array - both derived from the Eigenmike 32-channel recordings. For more details on the format specifications, check the README.
We intend to add additional formats of the database, of both higher resolution (e.g. higher-order Ambisonics), or lower resolution (e.g. binaural).
REFERENCE DOAs
For each extracted RIR across a measurement trajectory there is a direction-of-arrival (DOA) associated with it, which can be used as the reference direction for sound source spatialized using this RIR, for training or evaluation purposes. The DOAs were determined acoustically from the extracted RIRs, by windowing the direct sound part and applying a broadband version of the MUSIC localization algorithm on the windowed multichannel signal.
The DOAs are provided as Cartesian components [x, y, z] of unit length vectors.
SCENE GENERATOR
A set of routines is shared, here termed scene generator, that can spatialize a bank of sound samples using the SRIRs and noise recordings of this library, to emulate scenes for the two target formats. The code is similar to the one used to generate the TAU-NIGENS Spatial Sound Events 2021 dataset, and has been ported to Python from the original version written in Matlab.
The generator can be found here, along with more details on its use.
The generator at the moment is set to work with the NIGENS sound event sample database, and the FSD50K sound event database, but additional sample banks can be added with small modifications.
The dataset together with the generator has been used by the authors in the following public challenges:
DCASE 2019 Challenge Task 3, to generate the TAU Spatial Sound Events 2019 dataset (development/evaluation)
DCASE 2020 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2020 dataset
DCASE2021 Challenge Task 3, to generate the TAU-NIGENS Spatial Sound Events 2021 dataset
DCASE2022 Challenge Task 3, to generate additional SELD synthetic mixtures for training the task baseline
NOTE: The current version of the generator is work-in-progress, with some code being quite "rough". If something does not work as intended or it is not clear what certain parts do, please contact us.
DATASET STRUCTURE
The dataset contains a folder of the SRIRs (TAU-SRIR_DB), with all the SRIRs per room in a single MAT file. The file rirdata.mat contains some general information such as sample rate, format specifications, and most importantly the DOAs of every extracted SRIR. The file measinfo.mat contains measurement and recording information in each room. Finally, the dataset contains a folder of spatial ambient noise recordings (TAU-SNoise_DB), with one subfolder per room having two audio recordings fo the spatial ambience, one for each format, FOA or MIC. For more information on how to SRIRs and DOAs are organized, check the README.
DOWNLOAD
The files TAU-SRIR_DB.z01, ..., TAU-SRIR_DB.zip contain the SRIRs and measurement info files.
The files TAU-SNoise_DB.z01, ..., TAU-SNoise_DB.zip
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES RESIDENCE 1 YEAR AGO - DP02 Universe - Population 1 year and over Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 For the ACS, people who had moved from another residence in the United States or Puerto Rico 1 year earlier were asked to report the exact address (number and street name); the name of the city, town, or post office; the name of the U.S. county or municipio in Puerto Rico; state or Puerto Rico; and the ZIP Code where they lived 1 year ago. People living outside the United States and Puerto Rico were asked to report the name of the foreign country or U.S. Island Area where they were living 1 year ago.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a view of Bitcoin (BTC) market data by the minute, spanning from 2020 to the current date of October 24, 2023. It provided a wealth of valuable information (pure gold) for those interested in analyzing and understanding the minute-by-minute dynamics of the BTC market. Suitable for Algorithmic Trading, Neural Network, Reinforcement Learning, Machine Learning, Statistical Analysis and any kind of predictive analysis.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is revised based on https://www.kaggle.com/datasets/schmadam97/nba-playbyplay-data-20182019
This dataset offers a comprehensive play-by-play log of NBA games, detailing not only scoring plays but also player movements, fouls, rebounds, and other significant actions within each game.
Facebook
TwitterUnderstanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Calendar Year Dataset, 2020, is designed to enable cross-sectional analysis of individuals and households relating specifically to their annual interviews conducted in the year 2020, and, therefore, combine data collected in three waves (Waves 10, 11 and 12). It has been produced from the same data collected in the main Understanding Society study and released in the longitudinal datasets SN 6614 (End User Licence) and SN 6931 (Special Licence). Such cross-sectional analysis can, however, only involve variables that are collected in every wave in order to have data for the full sample panel. The 2020 dataset is the first of a series of planned Calendar Year Datasets to facilitate cross-sectional analysis of specific years. Full details of the Calendar Year Dataset sample structure (including why some individual interviews from 2021 are included), data structure and additional supporting information can be found in the 8988_calendar_year_dataset_2020_user_guide. As multi-topic studies, the purpose of Understanding Society is to understand the short- and long-term effects of social and economic change in the UK at the household and individual levels. The study has a strong emphasis on domains of family and social ties, employment, education, financial resources, and health. Understanding Society is an annual survey of each adult member of a nationally representative sample. The same individuals are re-interviewed in each wave approximately 12 months apart. When individuals move they are followed within the UK and anyone joining their households are also interviewed as long as they are living with them. The fieldwork period for a single wave is 24 months. Data collection uses computer-assisted personal interviewing (CAPI) and web interviews (from wave 7), and includes a telephone mop-up. From March 2020 (the end of wave 10 and 2nd year of wave 11), due to the coronavirus pandemic, face-to-face interviews were suspended and the survey has been conducted by web and telephone only, but otherwise has continued as before. One person completes the household questionnaire. Each person aged 16 or older participates in the individual adult interview and self-completed questionnaire. Youths aged 10 to 15 are asked to respond to a paper self-completion questionnaire. In 2020 an additional frequent web survey was separately issued to sample members to capture data on the rapid changes in people’s lives due to the COVID-19 pandemic (see SN 8644). The COVID-19 Survey data are not included in this dataset. Further information may be found on the Understanding Society main stage webpage and links to publications based on the study can be found on the Understanding Society Latest Research webpage. Co-funders In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
End User Licence and Special Licence versions: There are two versions of the Calendar Year 2020 data. One is available under the standard End User Licence (EUL) agreement, and the other is a Special Licence (SL) version. The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see xxxx_eul_vs_sl_variable_differences for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (EUL) and 6931 (SL).
Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2020 dataset, subject to SL access conditions.
Facebook
TwitterThis dataset includes the number of people enrolled in DSS services by town and by program from CY 2015-2024. To view the full dataset and filter the data, click the "View Data" button at the top right of the screen. More data on people served by DSS can be found here. About this data For privacy considerations, a count of zero is used for counts less than five. A recipient is counted in all towns where that recipient resided in that year. Due to eligibility policies and operational processes, enrollment can vary slightly after publication. Please be aware of the point-in-time nature of the published data when comparing to other data published or shared by the Department of Social Services, as this data may vary slightly. Notes by year 2021 In March 2020, Connecticut opted to add a new Medicaid coverage group: the COVID-19 Testing Coverage for the Uninsured. Enrollment data on this limited-benefit Medicaid coverage group is being incorporated into Medicaid data effective January 1, 2021. Enrollment data for this coverage group prior to January 1, 2021, was listed under State Funded Medical. An historical accounting of enrollment of the specific coverage group starting in calendar year 2020 will also be published separately. 2018 On April 22, 2019 the methodology for determining HUSKY A Newborn recipients changed, which caused an increase of recipients for that benefit starting in October 2016. We now count recipients recorded in the ImpaCT system as well as in the HIX system for that assistance type, instead using HIX exclusively. Also, the methodology for determining the address of the recipients changed: 1. The address of a recipient in the ImpaCT system is now correctly determined specific to that month instead of using the address of the most recent month. This resulted in some shuffling of the recipients among townships starting in October 2016. If, in a given month, a recipient has benefit records in both the HIX system and in the ImpaCT system, the address of the recipient is now calculated as follows to resolve conflicts: Use the residential address in ImpaCT if it exists, else use the mailing address in ImpaCT if it exists, else use the address in HIX. This resulted in a reduction in counts for most townships starting in March 2017 because a single address is now used instead of two when the systems do not agree. On February 14, 2019 the enrollment counts for 2012-2015 across all programs were updated to account for an error in the data integration process. As a result, the count of the number of people served increased by 13% for 2012, 10% for 2013, 8% for 2014 and 4% for 2015. Counts for 2016, 2017 and 2018 remain unchanged. On January 16, 2019 these counts were revised to count a recipient in all locations that recipient resided in that year. On January 1, 2019 the counts were revised to count a recipient in only one town per year even when the recipient moved within the year. The most recent address is used.
Facebook
TwitterThis table contains 25 series, with data for years 1955 - 2013 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 items: Canada ...) Last permanent residence (25 items: Total immigrants; France; Great Britain; Total Europe ...).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standard block groups are clusters of blocks within the same census tract that have the same first digit of their 4-character census block number (e.g., Blocks 3001, 3002, 3003 to 3999 in census tract 1210.02 belong to block group 3). Current block groups do not always maintain these same block number to block group relationships due to boundary and feature changes that occur throughout the decade. For example, block 3001 might move due to a change in the census tract boundary. Even if the block is no longer in block group 3, the block number (3001) will not change. However, the GEOID for that block, identifying block group 3, would remain the same in the attribute information in the TIGER/Line Shapefiles because block GEOIDs are always built using the decennial geographic codes.Block groups delineated for the 2020 Census generally contain 600 to 3,000 people. Local participants delineated most block groups as part of the Census Bureau's PSAP. The Census Bureau delineated block groups only where a local or tribal government declined to participate or where the Census Bureau could not identify a potential local participant.A block group usually covers a contiguous area. Each census tract contains one or more block groups and block groups have unique numbers within census tract. Within the standard census geographic hierarchy, block groups never cross county or census tract boundaries, but may cross the boundaries of county subdivisions, places, urban areas, voting districts, congressional districts, and AIANNH areas.Block groups have a valid range of zero (0) through nine (9). Block groups beginning with a zero generally are in coastal and Great Lakes water and territorial seas. Rather than extending a census tract boundary into the Great Lakes or out to the 3-mile territorial sea limit, the Census Bureau delineated some census tract boundaries along the shoreline or just offshore.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NOTE: We plan to no longer update this dataset after May 22 2022.
These data sets are intended to inform researchers and public health experts about how populations are responding to physical distancing measures. In particular, there are two metrics, Change in Movement and Stay Put, that provide a slightly different perspective on movement trends. Change in Movement looks at how much people are moving around and compares it with a baseline period that predates most social distancing measures, while Stay Put looks at the fraction of the population that appear to stay within a small area during an entire day.
Full details, including the privacy protections in this data, are available here: https://research.fb.com/blog/2020/06/protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EyeFi Dataset
This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.
Data Collection Setup
In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.
The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.
To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.
List of Files
Here is a list of files included in the dataset:
|- 1_person
|- 1_person_1.h5
|- 1_person_2.h5
|- 2_people
|- 2_people_1.h5
|- 2_people_2.h5
|- 2_people_3.h5
|- 3_people
|- 3_people_1.h5
|- 3_people_2.h5
|- 3_people_3.h5
|- 5_people
|- 5_people_1.h5
|- 5_people_2.h5
|- 5_people_3.h5
|- 5_people_4.h5
|- 10_people
|- 10_people_1.h5
|- 10_people_2.h5
|- 10_people_3.h5
|- Kitchen
|- 1_person
|- kitchen_1_person_1.h5
|- kitchen_1_person_2.h5
|- kitchen_1_person_3.h5
|- 3_people
|- kitchen_3_people_1.h5
|- training
|- shuffuled_train.h5
|- shuffuled_valid.h5
|- shuffuled_test.h5
View-Dataset-Example.ipynb
README.md
In this dataset, folder `1_person/` , `2_people/` , `3_people/` , `5_people/`, and `10_people/` contains data collected from the lab area whereas `Kitchen/` folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.
The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from `1_person/` folder collected in the lab area (`1_person_1.h5` and `1_person_2.h5`).
Why multiple files in one folder?
Each folder contains multiple files. For example, `1_person` folder has two files: `1_person_1.h5` and `1_person_2.h5`. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like `1_person_1.h5`, `1_person_2.h5`) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.
Special note:
For `1_person_1.h5`, this file is generated by the same person who is holding the phone, and `1_person_2.h5` contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.
Access the data
To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.
Each file is structured as (except the files under *"training/"* folder):
|- csi_imag
|- csi_real
|- nPaths_1
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_2
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_3
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_4
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- num_obj
|- obj_0
|- cam_aoa
|- coordinates
|- obj_1
|- cam_aoa
|- coordinates
...
|- timestamp
The `csi_real` and `csi_imag` are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 `csi_real` and `csi_imag` values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. `nPaths_x` group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with `x` number of multiple paths specified during calculation. Under the `nPath_x` group are `offset_xx` subgroup where `xx` stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:
|Antennas | Offset 1 (rad) | Offset 2 (rad) |
|:-------:|:---------------:|:-------------:|
| 1 & 2 | 1.1899 | -2.0071
| 1 & 3 | 1.3883 | -1.8129
The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the `offset_xx` naming. For example, `offset_12` is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.
The `num_obj` field is used to store the number of human subjects present in the scene. The `obj_0` is always the subject who is holding the phone. In each file, there are `num_obj` of `obj_x`. For each `obj_x1`, we have the `coordinates` reported from the camera and `cam_aoa`, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the `training` folder) . It reflects the way the person carried the phone moved in the space (for `obj_0`) and everyone else walked (for other `obj_y`, where `y` > 0).
The `timestamp` is provided here for time reference for each WiFi packets.
To access the data (Python):
import h5py
data = h5py.File('3_people_3.h5','r')
csi_real = data['csi_real'][()]
csi_imag = data['csi_imag'][()]
cam_aoa = data['obj_0/cam_aoa'][()]
cam_loc = data['obj_0/coordinates'][()]
For file inside `training/` folder:
Files inside training folder has a different data structure:
|- nPath-1
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-2
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-3
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-4
|- aoa
|- csi_imag
|- csi_real
|- spotfi
The group `nPath-x` is the number of multiple path specified during the SpotFi calculation. `aoa` is the camera generated angle of arrival (AoA) (can be considered as ground truth), `csi_image` and `csi_real` is the imaginary and real component of the CSI value. `spotfi` is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across `1_person_1.h5` and `1_person_2.h5`. All the rows under the same `nPath-x` group are aligned (i.e., first row of `aoa` corresponds to the first row of `csi_imag`, `csi_real`, and `spotfi`. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the `1_person_1.h5` and `1_person_2.h5` files.
Citation
If you use the dataset, please cite our paper:
@inproceedings{eyefi2020,
title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching},
author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar},
booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)},
year={2020},
Facebook
Twitter[1] The Progress by Population Group analysis is a component of the Healthy People 2020 (HP2020) Final Review. The analysis included subsets of the 1,111 measurable HP2020 objectives that have data available for any of six broad population characteristics: sex, race and ethnicity, educational attainment, family income, disability status, and geographic location. Progress toward meeting HP2020 targets is presented for up to 24 population groups within these characteristics, based on objective data aggregated across HP2020 topic areas. The Progress by Population Group data are also available at the individual objective level in the downloadable data set. [2] The final value was generally based on data available on the HP2020 website as of January 2020. For objectives that are continuing into HP2030, more recent data will be included on the HP2030 website as it becomes available: https://health.gov/healthypeople. [3] For more information on the HP2020 methodology for measuring progress toward target attainment and the elimination of health disparities, see: Healthy People Statistical Notes, no 27; available from: https://www.cdc.gov/nchs/data/statnt/statnt27.pdf. [4] Status for objectives included in the HP2020 Progress by Population Group analysis was determined using the baseline, final, and target value. The progress status categories used in HP2020 were: a. Target met or exceeded—One of the following applies: (i) At baseline, the target was not met or exceeded, and the most recent value was equal to or exceeded the target (the percentage of targeted change achieved was equal to or greater than 100%); (ii) The baseline and most recent values were equal to or exceeded the target (the percentage of targeted change achieved was not assessed). b. Improved—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved 10% or more of the targeted change. c. Little or no detectable change—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was not statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved less than 10% of the targeted change; (iii) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was not statistically significant; (iv) Movement was away from the baseline and target, standard errors were not available, and the objective had moved less than 10% relative to the baseline; (v) No change was observed between the baseline and the final data point. d. Got worse—One of the following applies: (i) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was statistically significant; (ii) Movement was away from the baseline and target, standard errors were not available, and the objective had moved 10% or more relative to the baseline. NOTE: Measurable objectives had baseline data. SOURCE: National Center for Health Statistics, Healthy People 2020 Progress by Population Group database.