Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is People and education in the Third World. It features 7 columns including author, publication date, language, and book publisher.
By data.world's Admin [source]
This dataset contains aggregated spellings and mispellings of the names of 15 famous celebrities. Ever wonder if people can actually spell someone's name correctly? Now you can see for yourself with this compiled data from The Pudding's interactive spelling experiment called The Gyllenhaal Experiment! Interesting to see which names get misspelled more than others - some are easy to guess, some are surprising! With the data provided here, you can start uncovering trends in name-spelling habits. Visualize the data and start analyzing how unique or common each celebrity is with respect to spelling - who stands out? Who blends in? Check it out today and explore a side of celebrity life that hasn't been seen before!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains misnames of 15 famous celebrities. It can be used for a variety of research and analysis purposes, including exploring human language, understanding how names are misspelled, or generating data visualizations.
In order to get the most out of this dataset, you will need to familiarize yourself with its columns. The dataset consists of two columns- “data” and “updated”. The “data” column contains the misnames associated with each celebrity name. The “updated” column is automatically updated with the date on which the data was last changed or modified.
To use this dataset for your own research and analysis purposes, you may find it useful to filter out certain types of responses or patterns in order to focus more closely on particular trends or topics of interest; for example, if you are interested in exploring how spellings vary by region then you might wish to group together similar responses regardless of whether they exactly match one celebrity name over another (i.e., categorizing all spellings that follow a certain phonetic pattern). You can also separate different types of responses into separate groups in order to explore different aspects such as popularity (i.e., looking at which misspellings occurred most frequently).
- Creating an interactive quiz for users to test their spelling ability by challenging them to spell names correctly from the celebrity dataset.
- Building a dictionary database of the misspellings, fans’ nicknames and phonetic spellings of each celebrity so that people can find more information about them more easily and accurately.
- Measuring the popularity of individual celebrities by tracking the frequency in which their name is misspelled
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: data-all.csv | Column name | Description | |:--------------|:---------------------------------------------------| | data | Misspellings of celebrity names. (String) | | updated | Date when the misspelling was last updated. (Date) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit data.world's Admin.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2 rows and is filtered where the book is Between heaven and earth : the religious worlds people make and the scholars who study them. It features 7 columns including author, publication date, language, and book publisher.
The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.
Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).
The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.
The survey is focused on three core areas of research:
Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.
If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".
Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.
Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.
The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."
The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:
The survey data will be provided under embargo in both comma-delimited and statistical formats.
Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)
Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.
Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.
Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.
Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Denying democracy : how the IMF and World Bank take power from people. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author Lotte Burkhardt published in 2022 a free PDF entitled Encyclopedia of Eponymic Plant names. It consisted of two volumes, one listing all plant, algae, lichen, fossil plant, and fungal genera with the person they were named after. The other volume takes the list of people honored and lists the genera named after them. It can be found online here.
This dataset was created by Carmen Ulloa Ulloa by scraping the PDF of the A-Z names of people honored and converting it into a Google Sheet. That data were normalized with each row representing a person and the eponymic genera and the associated families split into multiple columns to make analysis easier. The data was then cleaned as the conversion from PDF was not 100% accurate with some names being split onto multiple lines, characters misread etc. The gender of the authors were annotated by the Women Plant Genera working group as part of our follow up work to a previous paper.
We have split the resulting table into three files. The first one contains the entire list of people honoured and the genera named for them. The other two are the first table split into just the flowering plant genera and the other one excludes plant genera.
Most of the women in the plants-only tab have been marked up from this project. More information could be added to the women for whom non-plant genera were named. We highly encourage anyone who is interested in an analysis of their own based on this data to do so, and get in touch with us with any questions. We anticipate that work on additional groups will deepen our understanding of the impact of the contributions women have made to botany. Our hope is that by making this dataset publically available others will explore the world of genera and eponomy, looking at interesting stories of people for whom genera were named.
The team would be greatful for any updates or corrections to this data, and we plan to publish updated versions of this dataset accordingly.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on pressure levels from 1940 to present".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We release the DOO-RE dataset which consists of data streams from 11 types of various ambient sensors by collecting data 24/7 from a real-world meeting room. 4 types of ambient sensors, called environment-driven sensors, measure continuous state changes in the environment (e.g. sound), and 4 types of sensors, called user-driven sensors, capture user state changes (e.g. motion). The remaining 3 types of sensors, called actuator-driven sensors, check whether the attached actuators are active (e.g. projector on/off). The values of each sensor are automatically collected by IoT agents which are responsible for each sensor in our IoT system. A part of the collected sensor data stream representing a user activity is extracted as an activity episode in the DOO-RE dataset. Each episode's activity labels are annotated and validated by cross-checking and the consent of multiple annotators. A total of 9 activity types appear in the space: 3 based on single users and 6 based on group (i.e. 2 or more people) users. As a result, DOO-RE is constructed with 696 labeled episodes for single and group activities from the meeting room. DOO-RE is a novel dataset created in a public space that contains the properties of the real-world environment and has the potential to be good uses for developing powerful activity recognition approaches.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.
The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.
The audio recording criteria justifying inclusion into the meta-database are:
The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.
datasets
datasets-sites
sites
deployments
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Austin's data portal activity metrics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/data-portal-activity-metricse on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Austin's open data portal provides lots of public data about the City of Austin. It also provides portal administrators with behind-the-scenes information about how the portal is used... but that data is mysterious, hard to handle in a spreadsheet, and not located all in one place.
Until now! Authorized city staff used admin credentials to grab this usage data and share it the public. The City of Austin wants to use this data to inform the development of its open data initiative and manage the open data portal more effectively.
This project contains related datasets for anyone to explore. These include site-level metrics, dataset-level metrics, and department information for context. A detailed detailed description of how the files were prepared (along with code) can be found on github here.
Example questions to answer about the data portal
- What parts of the open data portal do people seem to value most?
- What can we tell about who our users are?
- How are our data publishers doing?
- How much data is published programmatically vs manually?
- How data is super fresh? Super stale?
- Whatever you think we should know...
About the files
all_views_20161003.csv
There is a resource available to portal administrators called "Dataset of datasets". This is the export of that resource, and it was captured on Oct 3, 2016. It contains a summary of the assets available on the data portal. While this file contains over 1400 resources (such as views, charts, and binary files), only 363 are actual tabular datasets.
table_metrics_ytd.csv
This file contains information about the 363 tabular datasets on the portal. Activity metrics for an individual dataset can be accessed by calling Socrata's views/metrics API and passing along the dataset's unique ID, a time frame, and admin credentials. The process of obtaining the 363 identifiers, calling the API, and staging the information can be reviewed in the python notebook here.
site_metrics.csv
This file is the export of site-level stats that Socrata generates using a given time frame and grouping preference. This file contains records about site usage each month from Nov 2011 through Sept 2016. By the way, it contains 285 columns... and we don't know what many of them mean. But we are determined to find out!! For a preliminary exploration of the columns and what portal-related business processes to which they might relate, check out the notes in this python notebook here
city_departments_in_current_budget.csv
This file contains a list of all City of Austin departments according to how they're identified in the most recently approved budget documents. Could be helpful for getting to know more about who the publishers are.
crosswalk_to_budget_dept.csv
The City is in the process of standardizing how departments identify themselves on the data portal. In the meantime, here's a crosswalk from the department values observed in
all_views_20161003.csv
to the department names that appear in the City's budgetThis dataset was created by Hailey Pate and contains around 100 samples along with Di Sync Success, Browser Firefox 19, technical information and other features such as: - Browser Firefox 33 - Di Sync Failed - and more.
- Analyze Sf Query Error User in relation to Js Page View Admin
- Study the influence of Browser Firefox 37 on Datasets Created
- More datasets
If you use this dataset in your research, please credit Hailey Pate
--- Original source retains full ownership of the source dataset ---
This dataset provides a curated and labeled subset of employer entries derived from Wikidata, with the goal of improving the quality and usability of employer data. While Wikidata is an invaluable open resource, direct use often necessitates cleaning. This dataset addresses that need by offering metadata, statistics, and labels to help users identify and utilise valid employer information. An employer is generally defined here as a company or entity that provides employment paying wages or a salary. The dataset specifically screens out entries that do not represent true employers, such as individuals or plurals. It is particularly useful for tasks involving data cleaning, entity recognition, and understanding employment nomenclature.
This dataset is provided as a single CSV file, named employers.wikidata.all.labeled.csv
. Its current version is 1.0, with a file size of approximately 5.98 MB. The dataset contains a substantial number of entries, with item_id
having 60656 values, employer
having 60456 values, and description
having 60640 values.
This dataset is ideal for various applications, including: * Detecting new trends in employers, occupations, and employment terminology. * Automatic error correction of employer entries. * Converting plural forms of entities to singular forms. * Training Named Entity Recognition (NER) models to identify employer names. * Building Question/Answer models that can understand and respond to queries about employers. * Improving the accuracy of FastText language detection models. * Assessing FastText accuracy with limited data.
The dataset's coverage is global, drawing data from a Wikidata dump dated 2 February 2020. It includes employer entries from various linguistic contexts, as indicated by the language_detected
column, showcasing multilingual employer names and descriptions. The content primarily focuses on entities and organisations that meet the definition of an employer, rather than specific demographic groups.
CC BY-SA
This dataset is suitable for: * Data scientists and machine learning engineers working on natural language processing tasks. * Researchers interested in data quality, entity resolution, and knowledge graph analysis. * Developers building applications that require accurate employer information. * Anyone needing to clean and validate employer data for various analytical or operational purposes.
Original Data Source: ML-You-Can-Use Wikidata Employers labeled
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset for the article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario".
Abstract:
Museums are embracing social technologies in the attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this paper, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help enhancing the message and increasing the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.
Code to run a selection of experiments is available at https://github.com/rmartoglia/predict-twitter-ch
Dataset structure
The dataset contains the dataset used in the experiments of the above research paper. Only the extracted features for the museum tweet threads (and not the message full text) are provided and needed for the analyses.
We selected 23 well known world spread art museums and grouped them into five groups: G1 (museums with at least three million of followers); G2 (museums with more than one million of followers); G3 (museums with more than 400,000 followers); G4 (museums with more that 200,000 followers); G5 (Italian museums). From these museums, we analyzed ca. 40,000 tweets, with a number varying from 5k ca. to 11k ca. for each museum group, depending on the number of museums in each group.
Content features: these are the features that can be drawn form the content of the tweet itself. We further divide such features in the following two categories:
– Countable: these features have a value ranging into different intervals. We take into consideration: the number of hashtags (i.e., words preceded by #) in the tweet, the number of URLs (i.e., links to external resources), the number of images (e.g., photos and graphical emoticons), the number of mentions (i.e., twitter accounts preceded by @), the length of the tweet;
– On-Off : these features have binary values in {0, 1}. We observe whether the tweet has exclamation marks, question marks, person names, place names, organization names, other names. Moreover, we also take into consideration the tweet topic density: assuming that the involved topics correspond to the hashtags mentioned in the text, we define a tweet as dense of topics if the number of hashtags it contains is greater than a given threshold, set to 5. Finally, we observe the tweet sentiment that might be present (positive or negative) or not (neutral).
Context features: these features are not drawn form the content of the tweet itself and might give a larger picture of the context in which the tweet was sent. Namely, we take into consideration the part of the day in which the tweet was sent (morning, afternoon, evening and night respectively from 5:00am to 11:59am, from 12:00pm to 5:59pm, from 6:00pm to 10:59pm and from 11pm to 4:59am), and a boolean feature indicating whether the tweet is a retweet or not.
User features: these features are proper of the user that sent the tweet, and are the same for all the tweets of this user. Namely we consider the name of the museum and the number of followers of the user.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.
It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.
International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.
UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.
The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.
The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
People Data Labs is an aggregator of B2B person and company data. We source our globally compliant person dataset via our "Data Union".
The "Data Union" is our proprietary data sharing co-op. Customers opt-in to sharing their data and warrant that their data is fully compliant with global data privacy regulations. Some data sources are provided as a one time dump, others are refreshed every time we do a new data build. Our data sources come from a variety of verticals including HR Tech, Real Estate Tech, Identity/Anti-Fraud, Martech, and others. People Data Labs works with customers on compliance based topics. If a customer wishes to ensure anonymity, we work with them to anonymize the data.
Our company data has identifying information (name, website, social profiles), company attributes (industry, size, founded date), and tags + free text that is useful for segmentation.
https://brightdata.com/licensehttps://brightdata.com/license
With in-depth information on individuals who have been included in the international sanctions list and are currently facing economic sanctions from various countries and international organizations, you can benefit greatly. Our list includes key data attributes such as - first name, last name, citizenship, passport details, address, date of proscription & reason for listing. The comprehensive information on individuals listed on the international sanctions list helps organizations ensure compliance with sanctions regulations and avoid any potential risks associated with doing business with sanctioned entities.
Popular attributes:
✔ Financial Intelligence
✔ Credit Risk Analysis
✔ Compliance
✔ Bank Data Enrichment
✔ Account Profiling
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2 rows and is filtered where the book is Baptists through the centuries : a history of a global people. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated passive acoustic monitoring meta-datasets (i.e. meta-data collections). This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description. Additionally, R scripts are provided to replicate the analysis published in [placeholder].
The overview of all sampling sites and timelines can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. The recordings of this collection were annotated and analysed to explore macro-ecological trends.
The audio recording criteria justifying inclusion into the meta-database are:
The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database. The data shared here only includes validated collections.
Changes from version 4.0.0
Added link to the published synthesis.
Meta-database CSV files
collections
collections-sites
sites
deployments
recordings (partial download from ecoSound-web)
The TrajNet Challenge represents a large multi-scenario forecasting benchmark. The challenge consists on predicting 3161 human trajectories, observing for each trajectory 8 consecutive ground-truth values (3.2 seconds) i.e., t−7,t−6,…,t, in world plane coordinates (the so-called world plane Human-Human protocol) and forecasting the following 12 (4.8 seconds), i.e., t+1,…,t+12. The 8-12-value protocol is consistent with the most trajectory forecasting approaches, usually focused on the 5-dataset ETH-univ + ETH-hotel + UCY-zara01 + UCY-zara02 + UCY-univ. Trajnet extends substantially the 5-dataset scenario by diversifying the training data, thus stressing the flexibility and generalization one approach has to exhibit when it comes to unseen scenery/situations. In fact, TrajNet is a superset of diverse datasets that requires to train on four families of trajectories, namely 1) BIWI Hotel (orthogonal bird’s eye flight view, moving people), 2) Crowds UCY (3 datasets, tilted bird’s eye view, camera mounted on building or utility poles, moving people), 3) MOT PETS (multisensor, different human activities) and 4) Stanford Drone Dataset (8 scenes, high orthogonal bird’s eye flight view, different agents as people, cars etc. ), for a total of 11448 trajectories. Testing is requested on diverse partitions of BIWI Hotel, Crowds UCY, Stanford Drone Dataset, and is evaluated by a specific server (ground-truth testing data is unavailable for applicants).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is People and education in the Third World. It features 7 columns including author, publication date, language, and book publisher.