19 datasets found

Z
Data for: World's human migration patterns in 2000-2019 unveiled by...
data.niaid.nih.gov
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taka, Maija (2024). Data for: World's human migration patterns in 2000-2019 unveiled by high-resolution data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7997133
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Kummu, Matti
Virkki, Vili
Heino, Matias
Varis, Olli
Muttarak, Raya
Niva, Venla
Horton, Alexander
Abel, Guy J
Taka, Maija
Kinnunen, Pekka
Kallio, Marko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
This dataset provides a global gridded (5 arc-min resolution) detailed annual net-migration dataset for 2000-2019. We also provide global annual birth and death rate datasets – that were used to estimate the net-migration – for same years. The dataset is presented in details, with some further analyses, in the following publication. Please cite this paper when using data.

Niva et al. 2023. World's human migration patterns in 2000-2019 unveiled by high-resolution data. Nature Human Behaviour 7: 2023–2037. Doi: https://doi.org/10.1038/s41562-023-01689-4

You can explore the data in our online net-migration explorer: https://wdrg.aalto.fi/global-net-migration-explorer/

Short introduction to the data

For the dataset, we collected, gap-filled, and harmonised:

a comprehensive national level birth and death rate datasets for altogether 216 countries or sovereign states; and

sub-national data for births (data covering 163 countries, divided altogether into 2555 admin units) and deaths (123 countries, 2067 admin units).

These birth and death rates were downscaled with selected socio-economic indicators to 5 arc-min grid for each year 2000-2019. These allowed us to calculate the 'natural' population change and when this was compared with the reported changes in population, we were able to estimate the annual net-migration. See more about the methods and calculations at Niva et al (2023).

We recommend using the data either over multiple years (we provide 3, 5 and 20 year net-migration sums at gridded level) or then aggregated over larger area (we provide adm0, adm1 and adm2 level geospatial polygon files). This is due to some noise in the gridded annual data.

Due to copy-right issues we are not able to release all the original data collected, but those can be requested from the authors.

List of datasets

Birth and death rates:

raster_birth_rate_2000_2019.tif: Gridded birth rate for 2000-2019 (5 arc-min; multiband tif)

raster_death_rate_2000_2019.tif: Gridded death rate for 2000-2019 (5 arc-min; multiband tif)

tabulated_adm1adm0_birth_rate.csv: Tabulated sub-national birth rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

tabulated_ adm1adm0_death_rate.csv: Tabulated sub-national death rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

Net-migration:

raster_netMgr_2000_2019_annual.tif: Gridded annual net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_3yrSum.tif: Gridded 3-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_5yrSum.tif: Gridded 5-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_20yrSum.tif: Gridded 20-yr sum net-migration 2000-2019 (5 arc-min)

polyg_adm0_dataNetMgr.gpkg: National (adm 0 level) net-migration geospatial file (gpkg)

polyg_adm1_dataNetMgr.gpkg: Provincial (adm 1 level) net-migration geospatial file (gpkg) (if not adm 1 level division, adm 0 used)

polyg_adm2_dataNetMgr.gpkg: Communal (adm 2 level) net-migration geospatial file (gpkg) (if not adm 2 level division, adm 1 used; and if not adm 1 level division either, adm 0 used)

Files to run online net migration explorer

masterData.rds and admGeoms.rds are related to our online ‘Net-migration explorer’ tool (https://wdrg.aalto.fi/global-net-migration-explorer/). The source code of this application is available in https://github.com/vvirkki/net-migration-explorer. Running the application locally requires these two .rds files from this repository.

Metadata

Grids:

Resolution: 5 arc-min (0.083333333 degrees)

Spatial extent: Lon: -180, 180; -90, 90 (xmin, xmax, ymin, ymax)

Coordinate ref system: EPSG:4326 - WGS 84

Format: Multiband geotiff; each band for each year over 2000-2019

Units:

Birth and death rates: births/deaths per 1000 people per year

Net-migration: persons per 1000 people per time period (year, 3yr, 5yr, 20yr, depending on the dataset)

Geospatial polygon (gpkg) files:

Spatial extent: -180, 180; -90, 83.67 (xmin, xmax, ymin, ymax)

Temporal extent: annual over 2000-2019

Coordinate ref system: EPSG:4326 - WGS 84

Format: gkpk

Units:

Net-migration: persons per 1000 people per year
#IndiaNeedsOxygen Tweets
kaggle.com
zip
Updated Nov 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2021). #IndiaNeedsOxygen Tweets [Dataset]. https://www.kaggle.com/kaushiksuresh147/indianeedsoxygen-tweets
Explore at:
zip(4441094 bytes)Available download formats
Dataset updated
Nov 14, 2021
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
India marks one COVID-19 death every 5 minutes

https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">

Content

People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.

For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.

India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.

https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">

Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source

Dataset

The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.

The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |

Acknowledgements

https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters

Inspiration

The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.

And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
g
Coronavirus (Covid19) — Evolution by country and around the world (daily...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coronavirus (Covid19) — Evolution by country and around the world (daily maj) [Dataset]. https://gimi9.com/dataset/eu_5e5da8356f44412b1755a8f6/
Explore at:
Area covered
World
Description
[Edit 12/09/2020] You will now find in the files below the last 30 days, too many people do not respect the request not to recover too often the dataset (no interest in recovering every minute while the file changes 4 or 5 times a day) If you want access to the entire history, contact me [Edit 31/03/2020] Since yesterday, I made sure to have the data of the day since the ESSC, so the data of the same day are now available and updated several times a day (about every hour) as the new figures fall all over the world. The data of the previous day is always consolidated around 2am (it is no longer 1h since the time change). If you only want to have the complete data, just don't take into account the last day (today’s date) Here I share the data that I compile with the famous coronavirus infection world map created and maintained by The Johns Hopkins University and which serve me to display ** CoronaVirus statistics worldwide and by country** They share the day’s data each night on a GitHub deposit. My tools compile this new data as soon as they are available and I share the result here. This data is used to display tables and graphs on the CoronaVirus website (Covid19) of Politologue.com https://coronavirus.politologue.com/ This data will allow you to make your own graphs and analyses if you look at the subject. I do not oblige you to do it, but if my compilation allows you to do something about it and saved you time, a link to https://coronavirus.politologue.com/ will be appreciable. Information in files (csv and json) — Number of cases — Number of deaths — Number of healing — Death rate (percentage) — Healing rate (percentage) — Infection rate (persons still infected, not deceased or cured) (percentage) — And for data by country, you will find a field “country” If you integrate the client-side json or csv on a site or application, please keep a cache on your servers without risking an unexpected load on my servers.
Multipurpose World News Dataset
kaggle.com
Updated Jan 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Gazola Milan (2021). Multipurpose World News Dataset [Dataset]. https://www.kaggle.com/datasets/gabrielmilan/multipurpose-world-news-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 25, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gabriel Gazola Milan
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Area covered
World
Description
Content

This is a dataset I started building for my future personal projects, as I think this kind of data is quite hard to acquire for free and in short time. I started acquiring data on March 21st, 2020 and intend to keep doing that constantly.

What you'll have inside this are news extracted from the following sources:

Foxbusiness.com

Youtube.com

Cnet.com

The Verge

Nytimes.com

Rawstory.com

Investors.com

Wreg.com

Reuters

Koin.com

Inc.com

CNBC, Nj.com

Wmtw.com

Nbcdfw.com

Bloomberg

Wowt.com

Bbc.com

For every 20-minute interval, a script checks for new headlines on these sources and add'em into a database. This CSV file is generated from that.

I intend to update this dataset every day if I can (and if the machine I run this script is up).
Richness index (2010) - ClimAfrica WP4
data.amerigeoss.org
data.apps.fao.org
http, pdf, png, wms +1
Updated Feb 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Food and Agriculture Organization (2023). Richness index (2010) - ClimAfrica WP4 [Dataset]. https://data.amerigeoss.org/dataset/5d112b2b-9793-4484-808c-4a6172c5d4d0
Explore at:
png, pdf, http, zip, wmsAvailable download formats
Dataset updated
Feb 6, 2023
Dataset provided by
Food and Agriculture Organizationhttp://fao.org/
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
The “richness index” represents the level of economical wellbeing a country certain area in 2010. Regions with higher income per capita and low poverty rate and more access to market are wealthier and are therefore better able to prepare for and respond to adversity. The index results from the second cluster of the Principal Component Analysis preformed among 9 potential variables. The analysis identifies four dominant variables, namely “GDPppp per capita”, “agriculture share GDP per agriculture sector worker”, “poverty rate” and “market accessibility”, assigning weights of 0.33, 0.26, 0.25 and 0.16, respectively. Before to perform the analysis all variables were log transformed (except the “agriculture share GDP per agriculture sector worker”) to shorten the extreme variation and then were score-standardized (converted to distribution with average of 0 and standard deviation of 1; inverse method was applied for the “poverty rate” and “market accessibility”) in order to be comparable. The 0.5 arc-minute grid total GDPppp is based on the night time light satellite imagery of NOAA (see Ghosh, T., Powell, R., Elvidge, C. D., Baugh, K. E., Sutton, P. C., & Anderson, S. (2010).Shedding light on the global distribution of economic activity. The Open Geography Journal (3), 148-161) and adjusted to national total as recorded by International Monetary Fund for 2010. The “GDPppp per capita” was calculated dividing the total GDPppp by the population in each pixel. Further, a focal statistic ran to determine mean values within 10 km. This had a smoothing effect and represents some of the extended influence of intense economic activity for the local people. Country based data for “agriculture share GDP per agriculture sector worker” were calculated from GDPppp (data from International Monetary Fund) fraction from agriculture activity (measured by World Bank) divided by the number of worker in the agriculture sector (data from World Bank). The tabular data represents the average of the period 2008-2012 and were linked by country unit to the national boundaries shapefile (FAO/GAUL) and then converted into raster format (resolution 0.5 arc-minute). The first administrative level data for the “poverty rate” were estimated by NOAA for 2003 using nighttime lights satellite imagery. Tabular data were linked by first administrative unit to the first administrative boundaries shapefile (FAO/GAUL) and then converted into raster format (resolution 0.5 arc-minute). The 0.5 arc-minute grid “market accessibility” measures the travel distance in minutes to large cities (with population greater than 50,000 people). This dataset was developed by the European Commission and the World Bank to represent access to markets, schools, hospitals, etc.. The dataset capture the connectivity and the concentration of economic activity (in 2000). Markets may be important for a variety of reasons, including their abilities to spread risk and increase incomes. Markets are a means of linking people both spatially and over time. That is, they allow shocks (and risks) to be spread over wider areas. In particular, markets should make households less vulnerable to (localized) covariate shocks. This dataset has been produced in the framework of the “Climate change predictions in Sub-Saharan Africa: impacts and adaptations (ClimAfrica)” project, Work Package 4 (WP4). More information on ClimAfrica project is provided in the Supplemental Information section of this metadata.

Data publication: 2014-05-15

Supplemental Information:

ClimAfrica was an international project funded by European Commission under the 7th Framework Programme (FP7) for the period 2010-2014. The ClimAfrica consortium was formed by 18 institutions, 9 from Europe, 8 from Africa, and the Food and Agriculture Organization of United Nations (FAO).

ClimAfrica was conceived to respond to the urgent international need for the most appropriate and up-to-date tools and methodologies to better understand and predict climate change, assess its impact on African ecosystems and population, and develop the correct adaptation strategies. Africa is probably the most vulnerable continent to climate change and climate variability and shows diverse range of agro-ecological and geographical features. Thus the impacts of climate change can be very high and can greatly differ across the continent, and even within countries.

The project focused on the following specific objectives:

Develop improved climate predictions on seasonal to decadal climatic scales, especially relevant to SSA;

Assess climate impacts in key sectors of SSA livelihood and economy, especially water resources and agriculture;

Evaluate the vulnerability of ecosystems and civil population to inter-annual variations and longer trends (10 years) in climate;

Suggest and analyse new suited adaptation strategies, focused on local needs;

Develop a new concept of 10 years monitoring and forecasting warning system, useful for food security, risk management and civil protection in SSA;

Analyse the economic impacts of climate change on agriculture and water resources in SSA and the cost-effectiveness of potential adaptation measures.

The work of ClimAfrica project was broken down into the following work packages (WPs) closely connected. All the activities described in WP1, WP2, WP3, WP4, WP5 consider the domain of the entire South Sahara Africa region. Only WP6 has a country specific (watershed) spatial scale where models validation and detailed processes analysis are carried out.

Contact points:

Metadata Contact: FAO-Data

Resource Contact: Selvaraju Ramasamy

Resource constraints:

copyright

Online resources:

Richness index (2010)

Project deliverable D4.1 - Scenarios of major production systems in Africa

Climafrica Website - Climate Change Predictions In Sub-Saharan Africa: Impacts And Adaptations
World Soccer live data feed
kaggle.com
Updated Jan 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Ghahramani (2019). World Soccer live data feed [Dataset]. https://www.kaggle.com/datasets/analystmasters/world-soccer-live-data-feed/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohammad Ghahramani
Description
Context

This is the first live data stream on Kaggle providing a simple yet rich source of all soccer matches around the world 24/7 in real-time.

What makes it unique compared to other datasets?

It is the first live data feed on Kaggle and it is totally free

Unlike “Churn rate” datasets you do not have to wait months to evaluate your predictions; simply check the match’s outcome in a couple of hours

you can use your predictions/analysis for your own benefit instead of spending your time and resources on helping a company maximizing its profit

A Five year old laptop can do the calculations and you do not need high-end GPUs

Couldn’t make it to the top 3 submissions? Nevermind, you still have the chance to get your prize on your own

You can’t get accurate results on all samples? Do not worry, just filter out the hard ones (e.g. ignore international friendly) and simply choose the ones you are sure of.

Need help from human experts for each sample? Every sample comes with at least two opinions from experts

You wish you could add your complementary data? Just contact us and we will try to facilitate it.

Couldn’t win “Warren Buffett's 2018 March Madness Bracket Contest”? Here is your chance to make your accumulative profit.

Simply train your algorithm on the first version of training dataset of approximately 11.5k matches and predict the data provided in the following data feed.

Fetch the data stream

The CSV file is updated every 30 minutes at minutes 20’ and 50’ of every hour. I kindly request not to download it more than twice per hour as it incurs additional cost.

You may download the csv data file from the following link from Amazon S3 server by changing the FOLDER_NAME as below,

https://s3.amazonaws.com/FOLDER_NAME/amasters.csv

*. Substitute the FOLDER_NAME with "**analyst-masters**"

Content

Our goal is to identify the outcome of a match as Home, Draw or Away. The variety of sources and nature of information provided in this data stream makes it a unique database. Currently, FIVE servers are collecting data from soccer matches around the world, communicating with each other and finally aggregating the data based on the dominant features learned from 400,000 matches over 7 years. I describe every column and the data collection below in two categories, Category I – Current situation and Category II – Head-to-Head History. Hence, we divide the type of data we have from each team to 4 modes,

Mode 1: we have both Category I and Category II available

Mode 2: we only have Category I available

Mode 3: we only have Category II available

Mode 4: none of Category I and II are available

Below you can find a full illustration of each category.

I. Current situation

Col 1 to 3:

Votes_for_Home Votes_for_Draw Votes_for_Away

The most distinctive parts of the database are these 3 columns. We are releasing opinions of over 100 professional soccer analysts predicting the outcome of a match. Their votes is the result of every piece of information they receive on players, team line-up, injuries and the urge of a team to win a match to stay in the league. They are spread around the world in various time zones and are experts on soccer teams from various regions. Our servers aggregate their opinions to update the CSV file until kickoff. Therefore, even if 40 users predict Real-Madrid wins against Real-Sociedad in Santiago Bernabeu on January 6th, 2019 but 5 users predict Real-Sociedad (the away team) will be the winner, you should doubt the home win. Here, the “majority of votes” works in conjunction with other features.

Col 4 to 9:

Weekday Day Month Year Hour Minute

There are over 60,000 matches during a year, and approximately 400 ones are usually held per day on weekends. More critical and exciting matches, which are usually less predictable, are held toward the evening in Europe. We are currently providing time in Central Europe Time (CET) equivalent to GMT +01:00.

*. Please note that the 2nd row of the CSV file represents the time, data values are saved from all servers to the file.

Col 10 to 13:

Total_Bettors Bet_Perc_on_Home Bet_Perc_on_Draw Bet_Perc_on_Away

This data is recorded a few hours before the match as people place bets emotionally when kickoff approaches. The percentage of the overall number of people denoted as “Total_Bettors” is indicated in each column for “Home,” “Draw” and “Away” outcomes.

Col 14 to 15:

Team_1 Team_2

The team playing “Home” is “Team_1” and the opponent playing “Away” is “Team_2”.

Col 16 to 36:

League_Rank_1 League_Rank_2 Total_teams Points_1 Points_2 Max_points Min_points Won_1 Draw_1 Lost_1 Won_2 Draw_2 Lost_2 Goals_Scored_1 Goals_Scored_2 Goals_Rec_1 Goal_Rec_2 Goals_Diff_1 Goals_Diff_2

If the match is betw...
In The Wild (audio Deepfake)
kaggle.com
zip
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdalla Mohamed (2024). In The Wild (audio Deepfake) [Dataset]. https://www.kaggle.com/datasets/abdallamohamed312/in-the-wild-audio-deepfake
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 20, 2024
Authors
Abdalla Mohamed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
'In-the-Wild' Dataset We present a dataset of audio deepfakes (and corresponding benign audio) for a set of politicians and other public figures, collected from publicly available sources such as social networks and video streaming platforms. For n = 58 celebrities and politicians, we collect both bona-fide and spoofed audio. In total, we collect 20.8 hours of bona-fide and 17.2 hours of spoofed audio. On average, there are 23 minutes of bona-fide and 18 minutes of spoofed audio per speaker.

The dataset is intended to be used for evaluating deepfake detection and voice anti-spoofing machine-learning models. It is especially useful to judge a model's capability to generalize to realistic, in-the-wild audio samples. Find more information in our paper, and download the dataset here.

The most interesting deepfake detection models we used in our experiments are open-source on GitHub:

RawNet 2 RawGAT-ST PC-Darts This dataset and the associated documentation are licensed under the Apache License, Version 2.0.
Worldwide Soundscapes project meta-data
zenodo.org
Updated Dec 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song (2022). Worldwide Soundscapes project meta-data [Dataset]. http://doi.org/10.5281/zenodo.7415473
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7415473
Dataset updated
Dec 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.

The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.

The audio recording criteria justifying inclusion into the meta-database are:

Stationary (no transects, towed sensors or microphones mounted on cars)

Passive (unattended, no human disturbance by the recordist)

Ambient (no spatial or temporal focus on a particular species or direction)

Spatially and/or temporally replicated (multiple sites sampled at least at one common daytime or multiple days sampled at least in one common site)

The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.

datasets

dataset_id: incremental integer, primary key

name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

subset: incremental integer that can be used to distinguish datasets with identical names

collaborators: full names of people deemed responsible for the dataset, separated by commas

contributors: full names of people who are not the main collaborators but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses, separated by commas.

date_added: when the datased was added (DD/MM/YYYY)

URL_open_recordings: if recordings (even only some) from this dataset are openly available, indicate the internet link where they can be found.

URL_project: internet link for further information about the corresponding project

DOI_publication: DOI of corresponding publications, separated by comma

core_realm_IUCN: The core realm of the dataset. Datasets may have multiple realms, but the main one should be listed. Datasets may contain sampling sites from different realms in the "sites" sheet. IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

medium: the physical medium the microphone is situated in

protected_area: Whether the sampling sites were situated in protected areas or not, or only some.

GADM0: For datasets on land or in territorial waters, Global Administrative Database level0
https://gadm.org/

GADM1: For datasets on land or in territorial waters, Global Administrative Database level1
https://gadm.org/

GADM2: For datasets on land or in territorial waters, Global Administrative Database level2
https://gadm.org/

IHO: For marine locations, the sea area that encompassess all the sampling locations according to the International Hydrographic Organisation. Map here: https://www.arcgis.com/home/item.html?id=44e04407fbaf4d93afcb63018fbca9e2

locality: optional free text about the locality

latitude_numeric_region: study region approximate centroid latitude in WGS84 decimal degrees

longitude_numeric_region: study region approximate centroid longitude in WGS84 decimal degrees

sites_number: number of sites sampled

year_start: starting year of the sampling

year_end: ending year of the sampling

deployment_schedule: description of the sampling schedule, provisional

temporal_recording_selection: list environmental exclusion criteria that were used to determine which recording days or times to discard

high_pass_filter_Hz: frequency of the high-pass filter of the recorder, in Hz

variable_sampling_frequency: Does the sampling frequency vary? If it does, write "NA" in the sampling_frequency_kHz column and indicate it in the sampling_frequency_kHz column inside the deployments sheet

sampling_frequency_kHz: frequency the microphone was sampled at (sounds of half that frequency will be recorded)

variable_recorder:

recorder: recorder model used

microphone: microphone used

freshwater_recordist_position: position of the recordist relative to the microphone during sampling (only for freshwater)

collaborator_comments: free-text field for comments by the collaborators

validated: This cell is checked if the contents of all sheets are complete and have been found to be coherent and consistent with our requirements.

validator_name: name of person doing the validation

validation_comments: validators: please insert the date when someone was contacted

cross-check: this cell is checked if the collaborators confirm the spatial and temporal data after checking the corresponding site maps, deployment and operation time graphs found at https://drive.google.com/drive/folders/1qfwXH_7dpFCqyls-c6b8RZ_fbcn9kXbp?usp=share_link

datasets-sites

dataset_ID: primary key of datasets table

dataset_name: lookup field

site_ID: primary key of sites table

site_name: lookup field

sites

site_ID: unique site IDs, larger than 1000 for compatibility with ecoSound-web

site_name: name or code of sampling site as used in respective projects

latitude_numeric: exact numeric degrees coordinates of latitude

longitude_numeric: exact numeric degrees coordinates of longitude

topography_m: for sites on land: elevation. For marine sites: depth (negative). in meters

freshwater_depth_m

realm: Ecosystem type according to IUCN GET https://global-ecosystems.org/

biome: Ecosystem type according to IUCN GET https://global-ecosystems.org/

functional_group: Ecosystem type according to IUCN GET https://global-ecosystems.org/

comments

deployments

dataset_ID: primary key of datasets table

dataset_name: lookup field

deployment: use identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

start_date_min: earliest date of deployment start, double-click cell to get date-picker

start_date_max: latest date of deployment start, if applicable (only used when recorders were deployed over several days), double-click cell to get date-picker

start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

permanent: is the deployment permanent (in which case it would be ongoing and the end date or duration would be unknown)?

variable_duration_days: is the duration of the deployment variable? in days

duration_days: deployment duration per recorder (use the minimum if variable)

end_date_min: earliest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_date_max: latest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

recording_time: does the recording last from the deployment start time to the end time (continuous) or at scheduled daily intervals (scheduled)? Note: we consider recordings with duty cycles to be continuous.

operation_start_time_mixed: scheduled recording start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

operation_duration_minutes: duration of operation in minutes, if constant

operation_end_time_mixed: scheduled recording end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes.

sampling_frequency_kHz: only indicate the sampling frequency if it is variable within a particular dataset so that we need to code different frequencies for different deployments

recorder

subset_sites: If the deployment was not done in all the sites of the
f
Travel time to cities and ports in the year 2015
figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Nelson (2023). Travel time to cities and ports in the year 2015 [Dataset]. http://doi.org/10.6084/m9.figshare.7638134.v4
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7638134.v4
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Andy Nelson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5

If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD

The following text is a summary of the information in the above Data Descriptor.

The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.

The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.

These maps represent a unique global representation of physical access to essential services offered by cities and ports.

The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).

travel_time_to_ports_x (x ranges from 1 to 5)

The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.

Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes

Data type Byte (16 bit Unsigned Integer)

No data value 65535

Flags None

Spatial resolution 30 arc seconds

Spatial extent

Upper left -180, 85

Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)

Temporal resolution 2015

Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.

Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.

The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.

Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points

The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).

Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.

Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.

This process and results are included in the validation zip file.

Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.

The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.

The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.

The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.
CarDA - Car door Assembly Activities Dataset
zenodo.org
bin, pdf
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantinos Papoutsakis; Konstantinos Papoutsakis; Nikolaos Bakalos; Nikolaos Bakalos; Athena Zacharia; Athena Zacharia; Maria Pateraki; Maria Pateraki (2025). CarDA - Car door Assembly Activities Dataset [Dataset]. http://doi.org/10.5281/zenodo.14644367
Explore at:
pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14644367
Dataset updated
Jan 15, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Konstantinos Papoutsakis; Konstantinos Papoutsakis; Nikolaos Bakalos; Nikolaos Bakalos; Athena Zacharia; Athena Zacharia; Maria Pateraki; Maria Pateraki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CarDA dataset [1] (Car Door Assembly dataset) has been designed and captured to provide a comprehensive, multi-modal resource for analyzing car door assembly activities performed by trained line workers in realistic assembly lines.

It comprises a set of time-synchronized multi-camera RGB-D videos and human motion capture data acquired during car door assembly activities performed by real-line workers in a real manufacturing environment.

Deployment environment:

The use-case scenario concerns a real-world assembly line workplace in an automotive manufacturing industry, as the deployment environment. In this context,
line workers simulate the real car door assembly workflow using the prompts, sequences, and tools under very similar ergonomic and environmental conditions
as in existing factory shop floors.

The assembly line involves a conveyor belt that is separated into three virtually separated work areas that correspond to three assembly workstations. It moves at a low, constant speed, supporting cart-mounted car doors and material storage. A line worker is assigned to each workstation. All workers assemble car doors as the belt moves, with each station (WS10, WS20, and WS30). A worker completes a workstation-specific set of assembly actions, noted as a task cycle, lasting about 4 minutes before the cart proceeds to the next workstation for further assembly. Upon the successful completion of the task cycle, the cart is left to travel to the virtually defined area of the subsequent workstation where another line worker will continue the assembly process during the new task cycle. Each task cycle lasts approximately 4 minutes and is continuously repeated during the worker’s shift.

Data acquisition:

Data acquisition involves low-cost, passive RGB-D camera sensors that are installed at stationary locations alongside the car door assembly line and a motion
capture system for capturing time-synchronized sequences of images and motion capture data during car door assembly activities performed by real line workers.

Two stationary StereoLabs ZED2 stereo cameras were installed in each of the three workstations of the car door assembly line. The two stationary, workstation-specific cameras are located at bilateral positions on the two sides of the conveyor belt at the center of the area concerning that specific workstation.

The pair of RGB-D sensors were utilized to acquire stereo color and depth image sequences during car door task cycle executions. Each recording comprises
time-synchronized RGB (color) and depth image sequences captured throughout a task cycle execution at 30 frames per second (fps).

At the same time, the line worker used a wearable XSens MVN Link suit during work activities to acquire time-synced 3D motion capture data at 60 fps.

Note: Time synchronization between pairs of RGB-D (.svo) recordings (pairs captured during an assembly task cycle simultaneously from the inXX and outXX cameras installed by the wsXX) is guaranteed and relies on the StereoLabs ZED SDK acquisition software. Time synchronization between samples of the RGB-D and mp4 videos (30 fps) and the acquired motion capture data (60 fps) was performed manually with the starting frame/time of the video as a reference time. We have observed some time discrepancies between data samples of the two modalities that might occur after the first 40-50 seconds in some recordings.

CarDA Dataset:

The dataset has been split into two subsets, A and B.

Each comprises data acquired at different periods using the same multicamera system in the same manufacturing environment.

Subset A contains recordings of RGB-D videos, mp4 videos, and 3d human motion capture data (using the XSens MVN Link suit) acquired during car door assembly activities in all three workstations.

Subset B contains recordings of RGB-D videos and mp4 videos acquired during car door assembly activities in all three workstations.

CarDA subset Α

It contains:

RGB-D was acquired using StereoLabs ZED 2 sensors in .svo format

mp4 videos (30fps) extracted from the .svo files (using the left camera of the stereo pair of each camera).

3D human pose data (ground truth) captured using the Movella Xsens MVN Link motion capture system (60 fps) in .bvh format

Annotation data (xls file format):

Ground truth related to temporal segmentation and classification of car door assembly actions (subgoals) during task cycle executions, performed by personnel working directly on the assembly line for the CarDA dataset.

Ground truth data on the duration of basic ergonomic postures based on the EAWS ergonomic screening tool: Two experts in manufacturing and ergonomics performed manual annotations related to the EAWS screening tool.

CarDA subset Α files:

ws10 - svo - mp4 - bvh.rar
Five assembly task cycle executions are recorded in WS10 containing pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras, .bvh motion capture data acquired using the XSens Link system. Annotation data are also available.

ws20 - svo - mp4 - bvh.rar
Four assembly task cycle executions are recorded in WS20 containing pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras, .bvh motion capture data acquired using the XSens Link system. Annotation data are also available.

ws30 - svo - mp4 - bvh.rar
Four assembly task cycle executions are recorded in WS30 containing pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras, .bvh motion capture data acquired using the XSens Link system. Annotation data are also available.

CarDA subset B

It contains:

RGB-D was acquired using StereoLabs ZED 2 sensors in .svo format

mp4 videos (30fps) extracted from the .svo files (using the left camera of the stereo pair of each camera).

Annotation data (xls file format):

Ground truth related to temporal segmentation and classification of car door assembly actions (subgoals) during task cycle executions, performed by personnel working directly on the assembly line for the CarDA dataset.

Ground truth data on the duration of basic ergonomic postures based on the EAWS ergonomic screening tool: Two experts in manufacturing and ergonomics performed manual annotations related to the EAWS screening tool.

CarDA subset B files:

ws10 - svo - mp4.rar
Three pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided.

ws20 - svo - mp4.rar
Six pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided.

ws30 - svo - mp4.rar
Three pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided.

Contact:

Konstantinos Papoutsakis, PhD: papoutsa@ics.forth.gr

Maria Pateraki: mpateraki@mail.ntua.gr
Assistant Professor | National Technical University of Athens
Affiliated Researcher | Institute of Computer Science | FORTH

References:

[1] Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, and Maria Pateraki. A vision-based framework for human behavior understanding in industrial assembly lines. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops - T-CAP 2024 Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications, 2024.
Autism Spectrum Condition Multimodal Embodiment Open Repository (ASCMEOR)
zenodo.org
explore.openaire.eu
Updated Mar 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Batuhan Sayis; Rafael Ramirez; Narcis Pares; Batuhan Sayis; Rafael Ramirez; Narcis Pares (2021). Autism Spectrum Condition Multimodal Embodiment Open Repository (ASCMEOR) [Dataset]. http://doi.org/10.5281/zenodo.4557383
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4557383
Dataset updated
Mar 2, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Batuhan Sayis; Rafael Ramirez; Narcis Pares; Batuhan Sayis; Rafael Ramirez; Narcis Pares
Description
Autism Spectrum Condition (ASC) is still very unknown. As its name depicts it covers a very broad range of conditions that makes it difficult to define and deal with. Important research is being undertaken in genetics to try to understand its origin, cause and typification but no major advances have been made yet.

In the meantime, ASC people need pragmatic solutions to help them in a number of aspects of their daily lives, such as social interaction. ICT has become an important source of intervention and therapeutic tools in the last 10 years. There is a complete lack of sharing of data from trials of ICT tools for ASC. This data could be useful to many researchers to compare results and to build research in different directions from that same data. Within the available ICT tools Embodied Interaction is increasingly showing its potential in ASC. Data from these tools is multimodal in nature and is hence complex to store and analyze.

In our project we investigated a technological system, a full-body interactive Mixed Reality (MR) experience, to understand how full-body interactive systems can help children with Autism improve in social initiation behaviors. The approach of our project was to compare results from our MR experience with a typical LEGO based social intervention, where both mediate a face-to-face play session between an ASC child and a non-ASC child.

The project created a database called ASCMEOR which is a reference database of multimodal data from sessions of ASC children and youngsters using ICT therapy and intervention tools. This is the first time that this type of data is collected from ASC children interacting with complex ICT systems in a database and shared with experts around the world.

As a result of a collaboration with the “Multidisciplinary Unit on Autism Spectrum Disorder” of the Hospital Sant Joan de Déu, the unit provided links to the end users (i.e. high-functioning ASC children) on a local basis in the city of Barcelona. The demography was defined as children and young teenagers (8-12 years old). Participants had been formally diagnosed with ASC as determined by the Autism Diagnostic Observation Schedule (ADOS) module 3, which is designed for young people with verbal fluency, with a minimum diagnosed severity of 4. Verbal fluency being essential to achieve the level of collaboration required to play the game without the help of a psychologist or parent. As a measure to prevent problems playing or comprehending the game, both the ASC and non-ASC children, had to have a minimum IQ of 70 according to the Wechsler Intelligence Scale for Children (WISC) and were screened for epilepsy. All procedures performed were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards and ethical approval was obtained from the ethical committee of the hospital and Universitat Pompeu Fabra. Informed consent was obtained from the legal representatives of all participants included in the study.

The experimental procedure was run with 36 ASC/non-ASC dyads following a repeated-measure design with two conditions: Full-body interaction MR environment and the typical social intervention strategy based on LEGO bricks. The children with ASC played with their non-ASC partner for 15 minutes in the MR system, and with the same partner for 15 minutes in the LEGO setup. All children participated in both experimental conditions, and the order was randomized for each pair to counterbalance any learning effects. There was a 5 min break followed by a relaxation training between the 2 conditions. As a result of the experimental trials, the ASCMEOR dataset has been generated. Each experimental trial has a trial no associated with it e.g., the first trial has the trial no: “0001”. The data from each trial is organized in the following format:

(0) Experiment Timeline

(1) Video-coding of overt behaviors;

(2) System log files detailing system triggering of events;

(3) Questionnaires to the children and to the parents;

(4)Psychophysiological measures, i.e. electrocardiogram (ECG), electrodermal activity (EDA) and accelerometer (ACC)
f
Data from: S1 Dataset -
plos.figshare.com
xlsx
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S1 Dataset - [Dataset]. https://plos.figshare.com/articles/dataset/S1_Dataset_-/26237729
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0306877.s001
Dataset updated
Jul 10, 2024
Dataset provided by
PLOS ONE
Authors
Leandra Nagler; Carmen Eißmann; Marita Wasenitz; Franz Bahlmann; Ammar Al Naimi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The prevalence of overweight and obese people worldwide has dramatically increased in the last decades and is yet to peak. At the same time and partly due to obesity and associated assisted reproduction, twinning rates showed a clear rise in the last years. Adverse fetomaternal outcomes are known to occur in singleton and twin pregnancies in overweight and obese women. However, the impact of the obesity levels as defined by the World Health Organization on the outcomes of twin pregnancies has not been thoroughly studied. Therefore, the purpose of this study is to examine how maternal overweight, and the level of obesity affect fetomaternal outcomes in twin pregnancies, hypothesizing a higher likelihood for adverse outcomes with overweight and each obesity level. This is a retrospective cohort study with 2,349 twin pregnancies that delivered at the Buergerhospital Frankfurt, Germany between 2005 and 2020. The mothers were divided into exposure groups depending on their pre-gestational body mass index; these were normal weight (reference group), overweight and obesity levels I, II, and III. A multivariate logistic regression analysis was performed to assess the influence of overweight and obesity on gestational diabetes mellitus, preeclampsia, postpartum hemorrhage, intrauterine fetal death, and a five-minutes Apgar score below seven. The adjusted odds ratio for gestational diabetes compared to normal weight mothers were 1.47, 2.79, 4.05, and 6.40 for overweight and obesity levels I, II and III respectively (p = 0.015 for overweight and p < 0.001 for each obesity level). Maternal BMI had a significant association with the risk of preeclampsia (OR 1.04, p = 0.028). Overweight and obesity did not affect the odds of postpartum hemorrhage, fetal demise, or a low Apgar score. While maternal overweight and obesity did not influence the fetal outcomes in twin pregnancies, they significantly increased the risk of gestational diabetes and preeclampsia, and that risk is incremental with increasing level of obesity.
Intracity Fare Estimation
kaggle.com
Updated Sep 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2022). Intracity Fare Estimation [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/intracity-fare-estimation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaurav Dutta
Description
Intracity Fare Estimation Chennai ( formely known as Madras ) , located on the Coromandel Coast off the Bay of Bengal . People from all over the world come to the marvelous here to spend their holidays, enjoy the natural splendor and to collect unforgettable memories.

Recently, Devesh Sethia went to Chennai to met his friend Kartikay Singh , who is a reputed minister of Ministry of Transport and Highways , India . While he was there , many times he was being cheated by drivers . He asked his friend Kartikay to look into the matter and suggested him to declare a fare rate for different modes of transports in a city. He liked his idea . But analyzing these fares was difficult task for him. For this , he called one of his college friend , Mohammad Ausaf Jafri who has deep knowledge in Machine Learning and provided him dataset containing 20k samples of data for intracity fares in different cities of India .

He got a month period to come up with the best possible solution . But these days , he is so busy in his office works . He is not getting much time to spend on analyzing the data-set .So , he need help from others . Help him to solve the problem .

We are providing a training dataset( of 20k samples ) describing trips/samples for past 2 years ( from 01/01/2015 to 31/12/2016 ) for different modes of transport in different cities of India . Each data sample corresponds to one complete trip . it contains 11 features as follows 👎

Data Dictionary Here's a brief version of what you'll find in the data description file.

Variable Description ID It is a unique identifier for different samples TIMESTAMP( Datetime ) It is trip start time . Itâ€™s format is like (Year) - (Month) - (Day) (Hours) : ( Minutes ) : (Seconds ). STARTING_LATITUDE( Float ) It is trip start time positionâ€™s latitude in degree North STARTING_LONGITUDE( Float ) It is trip start time positionâ€™s longitude in degree East DESTINATION_LATITUDE( Float ) It is trip stop time positionâ€™s latitude in degree North. DESTINATION_LONGITUDE( Float ) It is trip stop time positionâ€™s longitude in degree East. VEHICLE_TYPE( String ) It tells different transport vehicle type used for the trip TOTAL_LUGGAGE_WEIGHT( Float ) It is total luggage carried by the passenger in kilograms WAIT_TIME( Float ) It is the time for which driver waited for the passenger before start of the trip in minutes TRAFFIC_STUCK_TIME( Integer ) It is the time for which vehicle waited in traffic in minutes DISTANCE( Integer ) It is total distance covered in a trip in kilometres FARE Trip Cost
Instagram: countries with the highest audience reach 2024
statista.com
davegsmith.com
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon (2025). Instagram: countries with the highest audience reach 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset updated
Jun 17, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
As of April 2024, Bahrain was the country with the highest Instagram audience reach with 95.6 percent. Kazakhstan also had a high Instagram audience penetration rate, with 90.8 percent of the population using the social network. In the United Arab Emirates, Turkey, and Brunei, the photo-sharing platform was used by more than 85 percent of each country's population.
Z
Data from: The global distribution of plants used by humans datasets: list...
data.niaid.nih.gov
Updated Jan 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Govaerts, Rafaël (2024). The global distribution of plants used by humans datasets: list of utilised species, occurrence data and model outputs at 10 arc-minutes spatial resolution [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8176317
Explore at:
Dataset updated
Jan 19, 2024
Dataset provided by
Antonelli, Alexander
Willis, Kathy J.
Dennehy-Carr, Zoe
Lemmens, Roel
Nesbitt, Mark
Cámara-Leret, Rodrigo
Govaerts, Rafaël
Ondo, Ian
Turner, Rob M.
van Andel, Tinde R.
Schmelzer, Gaby
Pironon, Samuel
Canteiro, Cátia
Hargreaves, Serene
Patmore, Kristina
Diazgranados, Mauricio
Baquero, Andrea C.
Milliken, William
Hudson, Alex J.
Ulian, Tiziana
Allkin, Robert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets and model outputs used to map the global distribution of utilised plants by humans. The folder is composed of two subfolders raw_data and processed_data containing respectively the list of utilised plant species modelled -utilised_plants_species_list.csv-, and their occurrence data -occurrence_data.zip- and predicted distribution -species_proba_per_cell.rds-.

The file utilised_plants_species_list.csv in the raw_data folder contains a list of 35687 plant species (and hybrids) used by humans and 10 plant use categories with the following 14 fields:

plant_ID: plant identifier number ranging from between 1-35687

binomial_acc_name: binomial accepted name of the plant species

author_acc_name: name of the author(s)

is_hybrid: logical TRUE or FALSE indicating whether the species is an hybrid or not.

AnimalFood: forage and fodder for vertebrate animals only.

EnvironmentalUses: examples include intercrops and nurse crops, ornamentals, barrier hedges, shade plants, windbreaks, soil improvers, plants for revegetation and erosion control, wastewater purifiers, indicators of the presence of metals, pollution, or underground water.

Fuels: charcoal, petroleum substitutes, fuel alcohols, etc. Given the importance of energy plants for people, those were distinguished from Materials.

GeneSources: wild relatives of major crops which may possess traits associated with biotic or abiotic resistance and may be valuable for breeding programs.

HumanFood: food for humans only, including beverages and food additives.

InvertebrateFood: plants consumed by invertebrates used by humans, such as bees, silkworms, lac insects and edible grubs.

Materials: woods, fibers, cork, cane, tannins, latex, resins, gums, waxes, oils, lipids, etc. and their derived products.

Medicines: both human and veterinary.

Poisons: plants which are poisonous to both vertebrates and invertebrates, both accidentally and intentionally, e.g., for hunting and fishing, molluscicides, herbicides, insecticides.

SocialsUses: plants used for social purposes, which cannot be defined as food or medicine, for instance, masticatories, smoking materials, narcotics, hallucinogens and psychoactive drugs, and plants with ritual or religious significance.

Totals: total number of uses recorded for a species

The zipfile occurrence_data.zip in the processed_data folder contains 35687 Comma Separated Values (CSV) files, one for each species, containing curated geographic occurrence records used to build species distribution models with the following 14 fields:

Species: the binomial accepted name of the species

Fullname: same as species

decimalLongitude: the geographic longitude of the occurrence records of the species in decimal degrees

decimalLatitude: the geographic latitude of the occurrence records of the species in decimal degrees

countryCode: a three-letter standard abbreviation for the country of the occurrence locality

coordinateUncertaintyinMeters: indicator for the accuracy of the coordinate location, described as the radius of a circle around the stated point location

year: year of the observation of the occurrence record of the species

individualCount: the number of individuals present at the time of the observation

gbifID: unique identifier number for the occurrence from the original database

basisOfRecords: the type of the individual record, e.g. observation, physical specimen, fossil, living ex-situ, culture collection specimen

institutionCode: the name of the institution or organization listed as the data publisher on GBIF

establishmentMeans: statement about whether an organism has been introduced to a given place and time through the direct or indirect activity of modern humans

is_cultivated_observation: whether or not an organism is cultivated

sourceID: name of the source database

The file species_proba_per_cell.rds in the processed_data folder is a R Data Serialization (RDS) file containing a data.table object with the following 3 fields:

plant_ID: plant identifier number ranging from between 1-35687

proba: species occurrence probability

cell: raster grid cell number between 1-2251762

This object can be used in combination with a raster layer to reconstruct the modelled distribution of each species or retrieve species richness and endemism.

Instagram: distribution of global audiences 2024, by age and gender

statista.com
davegsmith.com

Updated Jun 17, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset updated

Jun 17, 2025

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.

              Teens and social media

              As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
              Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.

Facebook: countries with the highest Facebook reach 2024
statista.com
davegsmith.com
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon (2025). Facebook: countries with the highest Facebook reach 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset updated
Jun 17, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
As of April 2024, Facebook had an addressable ad audience reach 131.1 percent in Libya, followed by the United Arab Emirates with 120.5 percent and Mongolia with 116 percent. Additionally, the Philippines and Qatar had addressable ad audiences of 114.5 percent and 111.7 percent.
Live births, by month
www150.statcan.gc.ca
open.canada.ca
+1more
Updated Sep 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Live births, by month [Dataset]. https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310041501
Explore at:
Unique identifier
https://doi.org/10.25318/1310041501-eng
Dataset updated
Sep 25, 2024
Dataset provided by
Government of Canadahttp://www.gg.ca/
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number and percentage of live births, by month of birth, 1991 to most recent year.
f
Dataset related to this study.
plos.figshare.com
bin
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erean Shigign Malka; Tarekegn Solomon; Dejene Hailu Kassa; Besfat Berihun Erega; Derara Girma Tufa (2024). Dataset related to this study. [Dataset]. http://doi.org/10.1371/journal.pone.0302665.s002
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302665.s002
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Erean Shigign Malka; Tarekegn Solomon; Dejene Hailu Kassa; Besfat Berihun Erega; Derara Girma Tufa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThe largest risk of child mortality occurs within the first week after birth. Early neonatal mortality remains a global public health concern, especially in sub-Saharan African countries. More than 75% of neonatal death occurs within the first seven days of birth, but there are limited prospective follow- up studies to determine time to death, incidence and predictors of death in Ethiopia particularly in the study area. The study aimed to determine incidence and predictors of early neonatal mortality among neonates admitted to the neonatal intensive care unit of Addis Ababa public hospitals, Ethiopia 2021.MethodsInstitutional prospective cohort study was conducted in four public hospitals found in Addis Ababa City, Ethiopia from June 7th, 2021 to July 13th, 2021. All early neonates consecutively admitted to the corresponding neonatal intensive care unit of selected hospitals were included in the study and followed until 7 days-old. Data were coded, cleaned, edited, and entered into Epi data version 3.1 and then exported to STATA software version 14.0 for analysis. The Kaplan Meier survival curve with log- rank test was used to compare survival time between groups. Moreover, both bi-variable and multivariable Cox proportional hazard regression model was used to identify the predictors of early neonatal mortality. All variables having P-value ≤0.2 in the bi-variable analysis model were further fitted to the multivariable model. The assumption of the model was checked graphically and using a global test. The goodness of fit of the model was performed using the Cox-Snell residual test and it was adequate.ResultsA total of 391 early neonates with their mothers were involved in this study. The incidence rate among admitted early neonates was 33.25 per 1000 neonate day’s observation [95% confidence interval (CI): 26.22, 42.17]. Being preterm birth [adjusted hazard ratio (AHR): 6.0 (95% CI 2.02, 17.50)], having low fifth minute Apgar score [AHR: 3.93 (95% CI; 1.5, 6.77)], low temperatures [AHR: 2.67 (95%CI; 1.41, 5.02)] and, resuscitating of early neonate [AHR: 2.80 (95% CI; 1.51,5.10)] were associated with increased hazard of early neonatal death. However, early neonatal crying at birth [AHR: 0.48 (95%CI; 0.26, 0.87)] was associated with reduced hazard of death.ConclusionsEarly neonatal mortality is high in Addis Ababa public Hospitals. Preterm birth, low five-minute Apgar score, hypothermia and crying at birth were found to be independent predictors of early neonatal death. Good care and attention to neonate with low Apgar scores, premature, and hypothermic neonates.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Taka, Maija (2024). Data for: World's human migration patterns in 2000-2019 unveiled by high-resolution data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7997133

Data for: World's human migration patterns in 2000-2019 unveiled by high-resolution data

Explore at:

Dataset updated

Jul 11, 2024

Dataset provided by

Kummu, Matti
Virkki, Vili
Heino, Matias
Varis, Olli
Muttarak, Raya
Niva, Venla
Horton, Alexander
Abel, Guy J
Taka, Maija
Kinnunen, Pekka
Kallio, Marko

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

World

Description

This dataset provides a global gridded (5 arc-min resolution) detailed annual net-migration dataset for 2000-2019. We also provide global annual birth and death rate datasets – that were used to estimate the net-migration – for same years. The dataset is presented in details, with some further analyses, in the following publication. Please cite this paper when using data.

Niva et al. 2023. World's human migration patterns in 2000-2019 unveiled by high-resolution data. Nature Human Behaviour 7: 2023–2037. Doi: https://doi.org/10.1038/s41562-023-01689-4

You can explore the data in our online net-migration explorer: https://wdrg.aalto.fi/global-net-migration-explorer/

Short introduction to the data

For the dataset, we collected, gap-filled, and harmonised:

a comprehensive national level birth and death rate datasets for altogether 216 countries or sovereign states; and

sub-national data for births (data covering 163 countries, divided altogether into 2555 admin units) and deaths (123 countries, 2067 admin units).

These birth and death rates were downscaled with selected socio-economic indicators to 5 arc-min grid for each year 2000-2019. These allowed us to calculate the 'natural' population change and when this was compared with the reported changes in population, we were able to estimate the annual net-migration. See more about the methods and calculations at Niva et al (2023).

We recommend using the data either over multiple years (we provide 3, 5 and 20 year net-migration sums at gridded level) or then aggregated over larger area (we provide adm0, adm1 and adm2 level geospatial polygon files). This is due to some noise in the gridded annual data.

Due to copy-right issues we are not able to release all the original data collected, but those can be requested from the authors.

List of datasets

Birth and death rates:

raster_birth_rate_2000_2019.tif: Gridded birth rate for 2000-2019 (5 arc-min; multiband tif)

raster_death_rate_2000_2019.tif: Gridded death rate for 2000-2019 (5 arc-min; multiband tif)

tabulated_adm1adm0_birth_rate.csv: Tabulated sub-national birth rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

tabulated_ adm1adm0_death_rate.csv: Tabulated sub-national death rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

Net-migration:

raster_netMgr_2000_2019_annual.tif: Gridded annual net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_3yrSum.tif: Gridded 3-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_5yrSum.tif: Gridded 5-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

raster_netMgr_2000_2019_20yrSum.tif: Gridded 20-yr sum net-migration 2000-2019 (5 arc-min)

polyg_adm0_dataNetMgr.gpkg: National (adm 0 level) net-migration geospatial file (gpkg)

polyg_adm1_dataNetMgr.gpkg: Provincial (adm 1 level) net-migration geospatial file (gpkg) (if not adm 1 level division, adm 0 used)

polyg_adm2_dataNetMgr.gpkg: Communal (adm 2 level) net-migration geospatial file (gpkg) (if not adm 2 level division, adm 1 used; and if not adm 1 level division either, adm 0 used)

Files to run online net migration explorer

masterData.rds and admGeoms.rds are related to our online ‘Net-migration explorer’ tool (https://wdrg.aalto.fi/global-net-migration-explorer/). The source code of this application is available in https://github.com/vvirkki/net-migration-explorer. Running the application locally requires these two .rds files from this repository.

Metadata

Grids:

Resolution: 5 arc-min (0.083333333 degrees)

Spatial extent: Lon: -180, 180; -90, 90 (xmin, xmax, ymin, ymax)

Coordinate ref system: EPSG:4326 - WGS 84

Format: Multiband geotiff; each band for each year over 2000-2019

Units:

Birth and death rates: births/deaths per 1000 people per year

Net-migration: persons per 1000 people per time period (year, 3yr, 5yr, 20yr, depending on the dataset)

Geospatial polygon (gpkg) files:

Spatial extent: -180, 180; -90, 83.67 (xmin, xmax, ymin, ymax)

Temporal extent: annual over 2000-2019

Coordinate ref system: EPSG:4326 - WGS 84

Format: gkpk

Units:

Net-migration: persons per 1000 people per year

Clear search

Close search

Google apps

Main menu

Data for: World's human migration patterns in 2000-2019 unveiled by...

#IndiaNeedsOxygen Tweets

India marks one COVID-19 death every 5 minutes

Content

Dataset

Acknowledgements

Inspiration

Coronavirus (Covid19) — Evolution by country and around the world (daily...

Multipurpose World News Dataset

Content

Richness index (2010) - ClimAfrica WP4

World Soccer live data feed

Context

Fetch the data stream

Content

In The Wild (audio Deepfake)

Worldwide Soundscapes project meta-data

Travel time to cities and ports in the year 2015

CarDA - Car door Assembly Activities Dataset

Autism Spectrum Condition Multimodal Embodiment Open Repository (ASCMEOR)

Data from: S1 Dataset -

Intracity Fare Estimation

Instagram: countries with the highest audience reach 2024

Data from: The global distribution of plants used by humans datasets: list...

Instagram: distribution of global audiences 2024, by age and gender

Facebook: countries with the highest Facebook reach 2024

Live births, by month

Dataset related to this study.

Data for: World's human migration patterns in 2000-2019 unveiled by high-resolution data