Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a global gridded (5 arc-min resolution) detailed annual net-migration dataset for 2000-2019. We also provide global annual birth and death rate datasets – that were used to estimate the net-migration – for same years. The dataset is presented in details, with some further analyses, in the following publication. Please cite this paper when using data.
Niva et al. 2023. World's human migration patterns in 2000-2019 unveiled by high-resolution data. Nature Human Behaviour 7: 2023–2037. Doi: https://doi.org/10.1038/s41562-023-01689-4
You can explore the data in our online net-migration explorer: https://wdrg.aalto.fi/global-net-migration-explorer/
Short introduction to the data
For the dataset, we collected, gap-filled, and harmonised:
a comprehensive national level birth and death rate datasets for altogether 216 countries or sovereign states; and
sub-national data for births (data covering 163 countries, divided altogether into 2555 admin units) and deaths (123 countries, 2067 admin units).
These birth and death rates were downscaled with selected socio-economic indicators to 5 arc-min grid for each year 2000-2019. These allowed us to calculate the 'natural' population change and when this was compared with the reported changes in population, we were able to estimate the annual net-migration. See more about the methods and calculations at Niva et al (2023).
We recommend using the data either over multiple years (we provide 3, 5 and 20 year net-migration sums at gridded level) or then aggregated over larger area (we provide adm0, adm1 and adm2 level geospatial polygon files). This is due to some noise in the gridded annual data.
Due to copy-right issues we are not able to release all the original data collected, but those can be requested from the authors.
List of datasets
Birth and death rates:
raster_birth_rate_2000_2019.tif: Gridded birth rate for 2000-2019 (5 arc-min; multiband tif)
raster_death_rate_2000_2019.tif: Gridded death rate for 2000-2019 (5 arc-min; multiband tif)
tabulated_adm1adm0_birth_rate.csv: Tabulated sub-national birth rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)
tabulated_ adm1adm0_death_rate.csv: Tabulated sub-national death rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)
Net-migration:
raster_netMgr_2000_2019_annual.tif: Gridded annual net-migration 2000-2019 (5 arc-min; multiband tif)
raster_netMgr_2000_2019_3yrSum.tif: Gridded 3-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)
raster_netMgr_2000_2019_5yrSum.tif: Gridded 5-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)
raster_netMgr_2000_2019_20yrSum.tif: Gridded 20-yr sum net-migration 2000-2019 (5 arc-min)
polyg_adm0_dataNetMgr.gpkg: National (adm 0 level) net-migration geospatial file (gpkg)
polyg_adm1_dataNetMgr.gpkg: Provincial (adm 1 level) net-migration geospatial file (gpkg) (if not adm 1 level division, adm 0 used)
polyg_adm2_dataNetMgr.gpkg: Communal (adm 2 level) net-migration geospatial file (gpkg) (if not adm 2 level division, adm 1 used; and if not adm 1 level division either, adm 0 used)
Files to run online net migration explorer
masterData.rds and admGeoms.rds are related to our online ‘Net-migration explorer’ tool (https://wdrg.aalto.fi/global-net-migration-explorer/). The source code of this application is available in https://github.com/vvirkki/net-migration-explorer. Running the application locally requires these two .rds files from this repository.
Metadata
Grids:
Resolution: 5 arc-min (0.083333333 degrees)
Spatial extent: Lon: -180, 180; -90, 90 (xmin, xmax, ymin, ymax)
Coordinate ref system: EPSG:4326 - WGS 84
Format: Multiband geotiff; each band for each year over 2000-2019
Units:
Birth and death rates: births/deaths per 1000 people per year
Net-migration: persons per 1000 people per time period (year, 3yr, 5yr, 20yr, depending on the dataset)
Geospatial polygon (gpkg) files:
Spatial extent: -180, 180; -90, 83.67 (xmin, xmax, ymin, ymax)
Temporal extent: annual over 2000-2019
Coordinate ref system: EPSG:4326 - WGS 84
Format: gkpk
Units:
Net-migration: persons per 1000 people per year
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">
People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.
For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.
India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.
https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">
Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source
The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.
The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters
The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.
And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
[Edit 12/09/2020] You will now find in the files below the last 30 days, too many people do not respect the request not to recover too often the dataset (no interest in recovering every minute while the file changes 4 or 5 times a day) If you want access to the entire history, contact me [Edit 31/03/2020] Since yesterday, I made sure to have the data of the day since the ESSC, so the data of the same day are now available and updated several times a day (about every hour) as the new figures fall all over the world. The data of the previous day is always consolidated around 2am (it is no longer 1h since the time change). If you only want to have the complete data, just don't take into account the last day (today’s date) Here I share the data that I compile with the famous coronavirus infection world map created and maintained by The Johns Hopkins University and which serve me to display ** CoronaVirus statistics worldwide and by country** They share the day’s data each night on a GitHub deposit. My tools compile this new data as soon as they are available and I share the result here. This data is used to display tables and graphs on the CoronaVirus website (Covid19) of Politologue.com https://coronavirus.politologue.com/ This data will allow you to make your own graphs and analyses if you look at the subject. I do not oblige you to do it, but if my compilation allows you to do something about it and saved you time, a link to https://coronavirus.politologue.com/ will be appreciable. Information in files (csv and json) — Number of cases — Number of deaths — Number of healing — Death rate (percentage) — Healing rate (percentage) — Infection rate (persons still infected, not deceased or cured) (percentage) — And for data by country, you will find a field “country” If you integrate the client-side json or csv on a site or application, please keep a cache on your servers without risking an unexpected load on my servers.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This is a dataset I started building for my future personal projects, as I think this kind of data is quite hard to acquire for free and in short time. I started acquiring data on March 21st, 2020 and intend to keep doing that constantly.
What you'll have inside this are news extracted from the following sources:
For every 20-minute interval, a script checks for new headlines on these sources and add'em into a database. This CSV file is generated from that.
I intend to update this dataset every day if I can (and if the machine I run this script is up).
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The “richness index” represents the level of economical wellbeing a country certain area in 2010. Regions with higher income per capita and low poverty rate and more access to market are wealthier and are therefore better able to prepare for and respond to adversity. The index results from the second cluster of the Principal Component Analysis preformed among 9 potential variables. The analysis identifies four dominant variables, namely “GDPppp per capita”, “agriculture share GDP per agriculture sector worker”, “poverty rate” and “market accessibility”, assigning weights of 0.33, 0.26, 0.25 and 0.16, respectively. Before to perform the analysis all variables were log transformed (except the “agriculture share GDP per agriculture sector worker”) to shorten the extreme variation and then were score-standardized (converted to distribution with average of 0 and standard deviation of 1; inverse method was applied for the “poverty rate” and “market accessibility”) in order to be comparable. The 0.5 arc-minute grid total GDPppp is based on the night time light satellite imagery of NOAA (see Ghosh, T., Powell, R., Elvidge, C. D., Baugh, K. E., Sutton, P. C., & Anderson, S. (2010).Shedding light on the global distribution of economic activity. The Open Geography Journal (3), 148-161) and adjusted to national total as recorded by International Monetary Fund for 2010. The “GDPppp per capita” was calculated dividing the total GDPppp by the population in each pixel. Further, a focal statistic ran to determine mean values within 10 km. This had a smoothing effect and represents some of the extended influence of intense economic activity for the local people. Country based data for “agriculture share GDP per agriculture sector worker” were calculated from GDPppp (data from International Monetary Fund) fraction from agriculture activity (measured by World Bank) divided by the number of worker in the agriculture sector (data from World Bank). The tabular data represents the average of the period 2008-2012 and were linked by country unit to the national boundaries shapefile (FAO/GAUL) and then converted into raster format (resolution 0.5 arc-minute). The first administrative level data for the “poverty rate” were estimated by NOAA for 2003 using nighttime lights satellite imagery. Tabular data were linked by first administrative unit to the first administrative boundaries shapefile (FAO/GAUL) and then converted into raster format (resolution 0.5 arc-minute). The 0.5 arc-minute grid “market accessibility” measures the travel distance in minutes to large cities (with population greater than 50,000 people). This dataset was developed by the European Commission and the World Bank to represent access to markets, schools, hospitals, etc.. The dataset capture the connectivity and the concentration of economic activity (in 2000). Markets may be important for a variety of reasons, including their abilities to spread risk and increase incomes. Markets are a means of linking people both spatially and over time. That is, they allow shocks (and risks) to be spread over wider areas. In particular, markets should make households less vulnerable to (localized) covariate shocks. This dataset has been produced in the framework of the “Climate change predictions in Sub-Saharan Africa: impacts and adaptations (ClimAfrica)” project, Work Package 4 (WP4). More information on ClimAfrica project is provided in the Supplemental Information section of this metadata.
Data publication: 2014-05-15
Supplemental Information:
ClimAfrica was an international project funded by European Commission under the 7th Framework Programme (FP7) for the period 2010-2014. The ClimAfrica consortium was formed by 18 institutions, 9 from Europe, 8 from Africa, and the Food and Agriculture Organization of United Nations (FAO).
ClimAfrica was conceived to respond to the urgent international need for the most appropriate and up-to-date tools and methodologies to better understand and predict climate change, assess its impact on African ecosystems and population, and develop the correct adaptation strategies. Africa is probably the most vulnerable continent to climate change and climate variability and shows diverse range of agro-ecological and geographical features. Thus the impacts of climate change can be very high and can greatly differ across the continent, and even within countries.
The project focused on the following specific objectives:
Develop improved climate predictions on seasonal to decadal climatic scales, especially relevant to SSA;
Assess climate impacts in key sectors of SSA livelihood and economy, especially water resources and agriculture;
Evaluate the vulnerability of ecosystems and civil population to inter-annual variations and longer trends (10 years) in climate;
Suggest and analyse new suited adaptation strategies, focused on local needs;
Develop a new concept of 10 years monitoring and forecasting warning system, useful for food security, risk management and civil protection in SSA;
Analyse the economic impacts of climate change on agriculture and water resources in SSA and the cost-effectiveness of potential adaptation measures.
The work of ClimAfrica project was broken down into the following work packages (WPs) closely connected. All the activities described in WP1, WP2, WP3, WP4, WP5 consider the domain of the entire South Sahara Africa region. Only WP6 has a country specific (watershed) spatial scale where models validation and detailed processes analysis are carried out.
Contact points:
Metadata Contact: FAO-Data
Resource Contact: Selvaraju Ramasamy
Resource constraints:
copyright
Online resources:
Project deliverable D4.1 - Scenarios of major production systems in Africa
Climafrica Website - Climate Change Predictions In Sub-Saharan Africa: Impacts And Adaptations
This is the first live data stream on Kaggle providing a simple yet rich source of all soccer matches around the world 24/7 in real-time.
What makes it unique compared to other datasets?
Simply train your algorithm on the first version of training dataset of approximately 11.5k matches and predict the data provided in the following data feed.
The CSV file is updated every 30 minutes at minutes 20’ and 50’ of every hour. I kindly request not to download it more than twice per hour as it incurs additional cost.
You may download the csv data file from the following link from Amazon S3 server by changing the FOLDER_NAME as below,
https://s3.amazonaws.com/FOLDER_NAME/amasters.csv
*. Substitute the FOLDER_NAME with "**analyst-masters**"
Our goal is to identify the outcome of a match as Home, Draw or Away. The variety of sources and nature of information provided in this data stream makes it a unique database. Currently, FIVE servers are collecting data from soccer matches around the world, communicating with each other and finally aggregating the data based on the dominant features learned from 400,000 matches over 7 years. I describe every column and the data collection below in two categories, Category I – Current situation and Category II – Head-to-Head History. Hence, we divide the type of data we have from each team to 4 modes,
Below you can find a full illustration of each category.
I. Current situation
Col 1 to 3:
Votes_for_Home Votes_for_Draw Votes_for_Away
The most distinctive parts of the database are these 3 columns. We are releasing opinions of over 100 professional soccer analysts predicting the outcome of a match. Their votes is the result of every piece of information they receive on players, team line-up, injuries and the urge of a team to win a match to stay in the league. They are spread around the world in various time zones and are experts on soccer teams from various regions. Our servers aggregate their opinions to update the CSV file until kickoff. Therefore, even if 40 users predict Real-Madrid wins against Real-Sociedad in Santiago Bernabeu on January 6th, 2019 but 5 users predict Real-Sociedad (the away team) will be the winner, you should doubt the home win. Here, the “majority of votes” works in conjunction with other features.
Col 4 to 9:
Weekday Day Month Year Hour Minute
There are over 60,000 matches during a year, and approximately 400 ones are usually held per day on weekends. More critical and exciting matches, which are usually less predictable, are held toward the evening in Europe. We are currently providing time in Central Europe Time (CET) equivalent to GMT +01:00.
*. Please note that the 2nd row of the CSV file represents the time, data values are saved from all servers to the file.
Col 10 to 13:
Total_Bettors Bet_Perc_on_Home Bet_Perc_on_Draw Bet_Perc_on_Away
This data is recorded a few hours before the match as people place bets emotionally when kickoff approaches. The percentage of the overall number of people denoted as “Total_Bettors” is indicated in each column for “Home,” “Draw” and “Away” outcomes.
Col 14 to 15:
Team_1 Team_2
The team playing “Home” is “Team_1” and the opponent playing “Away” is “Team_2”.
Col 16 to 36:
League_Rank_1 League_Rank_2 Total_teams Points_1 Points_2 Max_points Min_points Won_1 Draw_1 Lost_1 Won_2 Draw_2 Lost_2 Goals_Scored_1 Goals_Scored_2 Goals_Rec_1 Goal_Rec_2 Goals_Diff_1 Goals_Diff_2
If the match is betw...
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
'In-the-Wild' Dataset We present a dataset of audio deepfakes (and corresponding benign audio) for a set of politicians and other public figures, collected from publicly available sources such as social networks and video streaming platforms. For n = 58 celebrities and politicians, we collect both bona-fide and spoofed audio. In total, we collect 20.8 hours of bona-fide and 17.2 hours of spoofed audio. On average, there are 23 minutes of bona-fide and 18 minutes of spoofed audio per speaker.
The dataset is intended to be used for evaluating deepfake detection and voice anti-spoofing machine-learning models. It is especially useful to judge a model's capability to generalize to realistic, in-the-wild audio samples. Find more information in our paper, and download the dataset here.
The most interesting deepfake detection models we used in our experiments are open-source on GitHub:
RawNet 2 RawGAT-ST PC-Darts This dataset and the associated documentation are licensed under the Apache License, Version 2.0.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.
The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.
The audio recording criteria justifying inclusion into the meta-database are:
The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.
datasets
datasets-sites
sites
deployments
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5
If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD
The following text is a summary of the information in the above Data Descriptor.
The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.
The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.
These maps represent a unique global representation of physical access to essential services offered by cities and ports.
The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).
travel_time_to_ports_x (x ranges from 1 to 5)
The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.
Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes
Data type Byte (16 bit Unsigned Integer)
No data value 65535
Flags None
Spatial resolution 30 arc seconds
Spatial extent
Upper left -180, 85
Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)
Temporal resolution 2015
Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.
Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.
The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.
Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points
The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).
Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.
Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.
This process and results are included in the validation zip file.
Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.
The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.
The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.
The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CarDA dataset [1] (Car Door Assembly dataset) has been designed and captured to provide a comprehensive, multi-modal resource for analyzing car door assembly activities performed by trained line workers in realistic assembly lines.
It comprises a set of time-synchronized multi-camera RGB-D videos and human motion capture data acquired during car door assembly activities performed by real-line workers in a real manufacturing environment.
Deployment environment:
The use-case scenario concerns a real-world assembly line workplace in an automotive manufacturing industry, as the deployment environment. In this context,
line workers simulate the real car door assembly workflow using the prompts, sequences, and tools under very similar ergonomic and environmental conditions
as in existing factory shop floors.
The assembly line involves a conveyor belt that is separated into three virtually separated work areas that correspond to three assembly workstations. It moves at a low, constant speed, supporting cart-mounted car doors and material storage. A line worker is assigned to each workstation. All workers assemble car doors as the belt moves, with each station (WS10, WS20, and WS30). A worker completes a workstation-specific set of assembly actions, noted as a task cycle, lasting about 4 minutes before the cart proceeds to the next workstation for further assembly. Upon the successful completion of the task cycle, the cart is left to travel to the virtually defined area of the subsequent workstation where another line worker will continue the assembly process during the new task cycle. Each task cycle lasts approximately 4 minutes and is continuously repeated during the worker’s shift.
Data acquisition:
Data acquisition involves low-cost, passive RGB-D camera sensors that are installed at stationary locations alongside the car door assembly line and a motion
capture system for capturing time-synchronized sequences of images and motion capture data during car door assembly activities performed by real line workers.
Two stationary StereoLabs ZED2 stereo cameras were installed in each of the three workstations of the car door assembly line. The two stationary, workstation-specific cameras are located at bilateral positions on the two sides of the conveyor belt at the center of the area concerning that specific workstation.
The pair of RGB-D sensors were utilized to acquire stereo color and depth image sequences during car door task cycle executions. Each recording comprises
time-synchronized RGB (color) and depth image sequences captured throughout a task cycle execution at 30 frames per second (fps).
At the same time, the line worker used a wearable XSens MVN Link suit during work activities to acquire time-synced 3D motion capture data at 60 fps.
Note: Time synchronization between pairs of RGB-D (.svo) recordings (pairs captured during an assembly task cycle simultaneously from the inXX and outXX cameras installed by the wsXX) is guaranteed and relies on the StereoLabs ZED SDK acquisition software. Time synchronization between samples of the RGB-D and mp4 videos (30 fps) and the acquired motion capture data (60 fps) was performed manually with the starting frame/time of the video as a reference time. We have observed some time discrepancies between data samples of the two modalities that might occur after the first 40-50 seconds in some recordings.
CarDA Dataset:
The dataset has been split into two subsets, A and B.
Each comprises data acquired at different periods using the same multicamera system in the same manufacturing environment.
Subset A contains recordings of RGB-D videos, mp4 videos, and 3d human motion capture data (using the XSens MVN Link suit) acquired during car door assembly activities in all three workstations.
Subset B contains recordings of RGB-D videos and mp4 videos acquired during car door assembly activities in all three workstations.
CarDA subset Α
It contains:
CarDA subset Α files:
CarDA subset B
It contains:
CarDA subset B files:
Contact:
Konstantinos Papoutsakis, PhD: papoutsa@ics.forth.gr
Maria Pateraki: mpateraki@mail.ntua.gr
Assistant Professor | National Technical University of Athens
Affiliated Researcher | Institute of Computer Science | FORTH
References:
[1] Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, and Maria Pateraki. A vision-based framework for human behavior understanding in industrial assembly lines. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops - T-CAP 2024 Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications, 2024.
Autism Spectrum Condition (ASC) is still very unknown. As its name depicts it covers a very broad range of conditions that makes it difficult to define and deal with. Important research is being undertaken in genetics to try to understand its origin, cause and typification but no major advances have been made yet.
In the meantime, ASC people need pragmatic solutions to help them in a number of aspects of their daily lives, such as social interaction. ICT has become an important source of intervention and therapeutic tools in the last 10 years. There is a complete lack of sharing of data from trials of ICT tools for ASC. This data could be useful to many researchers to compare results and to build research in different directions from that same data. Within the available ICT tools Embodied Interaction is increasingly showing its potential in ASC. Data from these tools is multimodal in nature and is hence complex to store and analyze.
In our project we investigated a technological system, a full-body interactive Mixed Reality (MR) experience, to understand how full-body interactive systems can help children with Autism improve in social initiation behaviors. The approach of our project was to compare results from our MR experience with a typical LEGO based social intervention, where both mediate a face-to-face play session between an ASC child and a non-ASC child.
The project created a database called ASCMEOR which is a reference database of multimodal data from sessions of ASC children and youngsters using ICT therapy and intervention tools. This is the first time that this type of data is collected from ASC children interacting with complex ICT systems in a database and shared with experts around the world.
As a result of a collaboration with the “Multidisciplinary Unit on Autism Spectrum Disorder” of the Hospital Sant Joan de Déu, the unit provided links to the end users (i.e. high-functioning ASC children) on a local basis in the city of Barcelona. The demography was defined as children and young teenagers (8-12 years old). Participants had been formally diagnosed with ASC as determined by the Autism Diagnostic Observation Schedule (ADOS) module 3, which is designed for young people with verbal fluency, with a minimum diagnosed severity of 4. Verbal fluency being essential to achieve the level of collaboration required to play the game without the help of a psychologist or parent. As a measure to prevent problems playing or comprehending the game, both the ASC and non-ASC children, had to have a minimum IQ of 70 according to the Wechsler Intelligence Scale for Children (WISC) and were screened for epilepsy. All procedures performed were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards and ethical approval was obtained from the ethical committee of the hospital and Universitat Pompeu Fabra. Informed consent was obtained from the legal representatives of all participants included in the study.
The experimental procedure was run with 36 ASC/non-ASC dyads following a repeated-measure design with two conditions: Full-body interaction MR environment and the typical social intervention strategy based on LEGO bricks. The children with ASC played with their non-ASC partner for 15 minutes in the MR system, and with the same partner for 15 minutes in the LEGO setup. All children participated in both experimental conditions, and the order was randomized for each pair to counterbalance any learning effects. There was a 5 min break followed by a relaxation training between the 2 conditions. As a result of the experimental trials, the ASCMEOR dataset has been generated. Each experimental trial has a trial no associated with it e.g., the first trial has the trial no: “0001”. The data from each trial is organized in the following format:
(0) Experiment Timeline
(1) Video-coding of overt behaviors;
(2) System log files detailing system triggering of events;
(3) Questionnaires to the children and to the parents;
(4)Psychophysiological measures, i.e. electrocardiogram (ECG), electrodermal activity (EDA) and accelerometer (ACC)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The prevalence of overweight and obese people worldwide has dramatically increased in the last decades and is yet to peak. At the same time and partly due to obesity and associated assisted reproduction, twinning rates showed a clear rise in the last years. Adverse fetomaternal outcomes are known to occur in singleton and twin pregnancies in overweight and obese women. However, the impact of the obesity levels as defined by the World Health Organization on the outcomes of twin pregnancies has not been thoroughly studied. Therefore, the purpose of this study is to examine how maternal overweight, and the level of obesity affect fetomaternal outcomes in twin pregnancies, hypothesizing a higher likelihood for adverse outcomes with overweight and each obesity level. This is a retrospective cohort study with 2,349 twin pregnancies that delivered at the Buergerhospital Frankfurt, Germany between 2005 and 2020. The mothers were divided into exposure groups depending on their pre-gestational body mass index; these were normal weight (reference group), overweight and obesity levels I, II, and III. A multivariate logistic regression analysis was performed to assess the influence of overweight and obesity on gestational diabetes mellitus, preeclampsia, postpartum hemorrhage, intrauterine fetal death, and a five-minutes Apgar score below seven. The adjusted odds ratio for gestational diabetes compared to normal weight mothers were 1.47, 2.79, 4.05, and 6.40 for overweight and obesity levels I, II and III respectively (p = 0.015 for overweight and p < 0.001 for each obesity level). Maternal BMI had a significant association with the risk of preeclampsia (OR 1.04, p = 0.028). Overweight and obesity did not affect the odds of postpartum hemorrhage, fetal demise, or a low Apgar score. While maternal overweight and obesity did not influence the fetal outcomes in twin pregnancies, they significantly increased the risk of gestational diabetes and preeclampsia, and that risk is incremental with increasing level of obesity.
Intracity Fare Estimation Chennai ( formely known as Madras ) , located on the Coromandel Coast off the Bay of Bengal . People from all over the world come to the marvelous here to spend their holidays, enjoy the natural splendor and to collect unforgettable memories.
Recently, Devesh Sethia went to Chennai to met his friend Kartikay Singh , who is a reputed minister of Ministry of Transport and Highways , India . While he was there , many times he was being cheated by drivers . He asked his friend Kartikay to look into the matter and suggested him to declare a fare rate for different modes of transports in a city. He liked his idea . But analyzing these fares was difficult task for him. For this , he called one of his college friend , Mohammad Ausaf Jafri who has deep knowledge in Machine Learning and provided him dataset containing 20k samples of data for intracity fares in different cities of India .
He got a month period to come up with the best possible solution . But these days , he is so busy in his office works . He is not getting much time to spend on analyzing the data-set .So , he need help from others . Help him to solve the problem .
We are providing a training dataset( of 20k samples ) describing trips/samples for past 2 years ( from 01/01/2015 to 31/12/2016 ) for different modes of transport in different cities of India . Each data sample corresponds to one complete trip . it contains 11 features as follows 👎
Data Dictionary Here's a brief version of what you'll find in the data description file.
Variable Description ID It is a unique identifier for different samples TIMESTAMP( Datetime ) It is trip start time . It’s format is like (Year) - (Month) - (Day) (Hours) : ( Minutes ) : (Seconds ). STARTING_LATITUDE( Float ) It is trip start time position’s latitude in degree North STARTING_LONGITUDE( Float ) It is trip start time position’s longitude in degree East DESTINATION_LATITUDE( Float ) It is trip stop time position’s latitude in degree North. DESTINATION_LONGITUDE( Float ) It is trip stop time position’s longitude in degree East. VEHICLE_TYPE( String ) It tells different transport vehicle type used for the trip TOTAL_LUGGAGE_WEIGHT( Float ) It is total luggage carried by the passenger in kilograms WAIT_TIME( Float ) It is the time for which driver waited for the passenger before start of the trip in minutes TRAFFIC_STUCK_TIME( Integer ) It is the time for which vehicle waited in traffic in minutes DISTANCE( Integer ) It is total distance covered in a trip in kilometres FARE Trip Cost
As of April 2024, Bahrain was the country with the highest Instagram audience reach with 95.6 percent. Kazakhstan also had a high Instagram audience penetration rate, with 90.8 percent of the population using the social network. In the United Arab Emirates, Turkey, and Brunei, the photo-sharing platform was used by more than 85 percent of each country's population.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and model outputs used to map the global distribution of utilised plants by humans. The folder is composed of two subfolders raw_data and processed_data containing respectively the list of utilised plant species modelled -utilised_plants_species_list.csv-, and their occurrence data -occurrence_data.zip- and predicted distribution -species_proba_per_cell.rds-.
The file utilised_plants_species_list.csv in the raw_data folder contains a list of 35687 plant species (and hybrids) used by humans and 10 plant use categories with the following 14 fields:
plant_ID: plant identifier number ranging from between 1-35687
binomial_acc_name: binomial accepted name of the plant species
author_acc_name: name of the author(s)
is_hybrid: logical TRUE or FALSE indicating whether the species is an hybrid or not.
AnimalFood: forage and fodder for vertebrate animals only.
EnvironmentalUses: examples include intercrops and nurse crops, ornamentals, barrier hedges, shade plants, windbreaks, soil improvers, plants for revegetation and erosion control, wastewater purifiers, indicators of the presence of metals, pollution, or underground water.
Fuels: charcoal, petroleum substitutes, fuel alcohols, etc. Given the importance of energy plants for people, those were distinguished from Materials.
GeneSources: wild relatives of major crops which may possess traits associated with biotic or abiotic resistance and may be valuable for breeding programs.
HumanFood: food for humans only, including beverages and food additives.
InvertebrateFood: plants consumed by invertebrates used by humans, such as bees, silkworms, lac insects and edible grubs.
Materials: woods, fibers, cork, cane, tannins, latex, resins, gums, waxes, oils, lipids, etc. and their derived products.
Medicines: both human and veterinary.
Poisons: plants which are poisonous to both vertebrates and invertebrates, both accidentally and intentionally, e.g., for hunting and fishing, molluscicides, herbicides, insecticides.
SocialsUses: plants used for social purposes, which cannot be defined as food or medicine, for instance, masticatories, smoking materials, narcotics, hallucinogens and psychoactive drugs, and plants with ritual or religious significance.
Totals: total number of uses recorded for a species
The zipfile occurrence_data.zip in the processed_data folder contains 35687 Comma Separated Values (CSV) files, one for each species, containing curated geographic occurrence records used to build species distribution models with the following 14 fields:
Species: the binomial accepted name of the species
Fullname: same as species
decimalLongitude: the geographic longitude of the occurrence records of the species in decimal degrees
decimalLatitude: the geographic latitude of the occurrence records of the species in decimal degrees
countryCode: a three-letter standard abbreviation for the country of the occurrence locality
coordinateUncertaintyinMeters: indicator for the accuracy of the coordinate location, described as the radius of a circle around the stated point location
year: year of the observation of the occurrence record of the species
individualCount: the number of individuals present at the time of the observation
gbifID: unique identifier number for the occurrence from the original database
basisOfRecords: the type of the individual record, e.g. observation, physical specimen, fossil, living ex-situ, culture collection specimen
institutionCode: the name of the institution or organization listed as the data publisher on GBIF
establishmentMeans: statement about whether an organism has been introduced to a given place and time through the direct or indirect activity of modern humans
is_cultivated_observation: whether or not an organism is cultivated
sourceID: name of the source database
The file species_proba_per_cell.rds in the processed_data folder is a R Data Serialization (RDS) file containing a data.table object with the following 3 fields:
plant_ID: plant identifier number ranging from between 1-35687
proba: species occurrence probability
cell: raster grid cell number between 1-2251762
This object can be used in combination with a raster layer to reconstruct the modelled distribution of each species or retrieve species richness and endemism.
As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.
Teens and social media
As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
As of April 2024, Facebook had an addressable ad audience reach 131.1 percent in Libya, followed by the United Arab Emirates with 120.5 percent and Mongolia with 116 percent. Additionally, the Philippines and Qatar had addressable ad audiences of 114.5 percent and 111.7 percent.
Number and percentage of live births, by month of birth, 1991 to most recent year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThe largest risk of child mortality occurs within the first week after birth. Early neonatal mortality remains a global public health concern, especially in sub-Saharan African countries. More than 75% of neonatal death occurs within the first seven days of birth, but there are limited prospective follow- up studies to determine time to death, incidence and predictors of death in Ethiopia particularly in the study area. The study aimed to determine incidence and predictors of early neonatal mortality among neonates admitted to the neonatal intensive care unit of Addis Ababa public hospitals, Ethiopia 2021.MethodsInstitutional prospective cohort study was conducted in four public hospitals found in Addis Ababa City, Ethiopia from June 7th, 2021 to July 13th, 2021. All early neonates consecutively admitted to the corresponding neonatal intensive care unit of selected hospitals were included in the study and followed until 7 days-old. Data were coded, cleaned, edited, and entered into Epi data version 3.1 and then exported to STATA software version 14.0 for analysis. The Kaplan Meier survival curve with log- rank test was used to compare survival time between groups. Moreover, both bi-variable and multivariable Cox proportional hazard regression model was used to identify the predictors of early neonatal mortality. All variables having P-value ≤0.2 in the bi-variable analysis model were further fitted to the multivariable model. The assumption of the model was checked graphically and using a global test. The goodness of fit of the model was performed using the Cox-Snell residual test and it was adequate.ResultsA total of 391 early neonates with their mothers were involved in this study. The incidence rate among admitted early neonates was 33.25 per 1000 neonate day’s observation [95% confidence interval (CI): 26.22, 42.17]. Being preterm birth [adjusted hazard ratio (AHR): 6.0 (95% CI 2.02, 17.50)], having low fifth minute Apgar score [AHR: 3.93 (95% CI; 1.5, 6.77)], low temperatures [AHR: 2.67 (95%CI; 1.41, 5.02)] and, resuscitating of early neonate [AHR: 2.80 (95% CI; 1.51,5.10)] were associated with increased hazard of early neonatal death. However, early neonatal crying at birth [AHR: 0.48 (95%CI; 0.26, 0.87)] was associated with reduced hazard of death.ConclusionsEarly neonatal mortality is high in Addis Ababa public Hospitals. Preterm birth, low five-minute Apgar score, hypothermia and crying at birth were found to be independent predictors of early neonatal death. Good care and attention to neonate with low Apgar scores, premature, and hypothermic neonates.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a global gridded (5 arc-min resolution) detailed annual net-migration dataset for 2000-2019. We also provide global annual birth and death rate datasets – that were used to estimate the net-migration – for same years. The dataset is presented in details, with some further analyses, in the following publication. Please cite this paper when using data.
Niva et al. 2023. World's human migration patterns in 2000-2019 unveiled by high-resolution data. Nature Human Behaviour 7: 2023–2037. Doi: https://doi.org/10.1038/s41562-023-01689-4
You can explore the data in our online net-migration explorer: https://wdrg.aalto.fi/global-net-migration-explorer/
Short introduction to the data
For the dataset, we collected, gap-filled, and harmonised:
a comprehensive national level birth and death rate datasets for altogether 216 countries or sovereign states; and
sub-national data for births (data covering 163 countries, divided altogether into 2555 admin units) and deaths (123 countries, 2067 admin units).
These birth and death rates were downscaled with selected socio-economic indicators to 5 arc-min grid for each year 2000-2019. These allowed us to calculate the 'natural' population change and when this was compared with the reported changes in population, we were able to estimate the annual net-migration. See more about the methods and calculations at Niva et al (2023).
We recommend using the data either over multiple years (we provide 3, 5 and 20 year net-migration sums at gridded level) or then aggregated over larger area (we provide adm0, adm1 and adm2 level geospatial polygon files). This is due to some noise in the gridded annual data.
Due to copy-right issues we are not able to release all the original data collected, but those can be requested from the authors.
List of datasets
Birth and death rates:
raster_birth_rate_2000_2019.tif: Gridded birth rate for 2000-2019 (5 arc-min; multiband tif)
raster_death_rate_2000_2019.tif: Gridded death rate for 2000-2019 (5 arc-min; multiband tif)
tabulated_adm1adm0_birth_rate.csv: Tabulated sub-national birth rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)
tabulated_ adm1adm0_death_rate.csv: Tabulated sub-national death rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)
Net-migration:
raster_netMgr_2000_2019_annual.tif: Gridded annual net-migration 2000-2019 (5 arc-min; multiband tif)
raster_netMgr_2000_2019_3yrSum.tif: Gridded 3-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)
raster_netMgr_2000_2019_5yrSum.tif: Gridded 5-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)
raster_netMgr_2000_2019_20yrSum.tif: Gridded 20-yr sum net-migration 2000-2019 (5 arc-min)
polyg_adm0_dataNetMgr.gpkg: National (adm 0 level) net-migration geospatial file (gpkg)
polyg_adm1_dataNetMgr.gpkg: Provincial (adm 1 level) net-migration geospatial file (gpkg) (if not adm 1 level division, adm 0 used)
polyg_adm2_dataNetMgr.gpkg: Communal (adm 2 level) net-migration geospatial file (gpkg) (if not adm 2 level division, adm 1 used; and if not adm 1 level division either, adm 0 used)
Files to run online net migration explorer
masterData.rds and admGeoms.rds are related to our online ‘Net-migration explorer’ tool (https://wdrg.aalto.fi/global-net-migration-explorer/). The source code of this application is available in https://github.com/vvirkki/net-migration-explorer. Running the application locally requires these two .rds files from this repository.
Metadata
Grids:
Resolution: 5 arc-min (0.083333333 degrees)
Spatial extent: Lon: -180, 180; -90, 90 (xmin, xmax, ymin, ymax)
Coordinate ref system: EPSG:4326 - WGS 84
Format: Multiband geotiff; each band for each year over 2000-2019
Units:
Birth and death rates: births/deaths per 1000 people per year
Net-migration: persons per 1000 people per time period (year, 3yr, 5yr, 20yr, depending on the dataset)
Geospatial polygon (gpkg) files:
Spatial extent: -180, 180; -90, 83.67 (xmin, xmax, ymin, ymax)
Temporal extent: annual over 2000-2019
Coordinate ref system: EPSG:4326 - WGS 84
Format: gkpk
Units:
Net-migration: persons per 1000 people per year