57 datasets found

d
Firmographic Data on all 300 million businesses worldwide in single Dataset
datarade.ai
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BoldData (2020). Firmographic Data on all 300 million businesses worldwide in single Dataset [Dataset]. https://datarade.ai/data-products/data-cleansing-bolddata
Explore at:
.csv, .xls, .json, .txtAvailable download formats
Dataset updated
Oct 28, 2020
Dataset authored and provided by
BoldData
Area covered
Azerbaijan, Kenya, Suriname, Guam, Romania, Canada, Palau, Iraq, Estonia, Montserrat
Description
Every single contact from our firmographic database with 341 million+ companies comes directly from local sources that you can trust and are GDPR proof. We can deliver 200 firmographics such as company size, industry, legal status, revenue, employee size, opening hours, geocodes, import / export. BoldData is the nr.1 supplier of firmographic data supplier because we make use of thousands of local data sources. Ask us for a quote!

Commute Source Intensity

data.wfrc.org

Updated Oct 18, 2018

Facebook

Twitter

Click to copy link

Link copied

Cite

Wasatch Front Regional Council (2018). Commute Source Intensity [Dataset]. https://data.wfrc.org/datasets/wfrc::commute-source-intensity/about

Explore at:

Dataset updated

Oct 18, 2018

Dataset authored and provided by

Wasatch Front Regional Council

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Description

All grid squares are approximately the size of a downtown Salt Lake City block.For each grid square, three metrics are available, each of which reflects a total from the analysis grid square and its 12 nearest grid squares (those whose center is within 0.25 miles of the boundary of the analysis grid square). Geometrically the 12 nearest cells are two cells in each cardinal direction and 1 cell diagonally (see graphic below).

 The three attribute values, representing metrics for the current 2015 model base year, are:

 Nearby Employment Intensity (NEI):
 Jobs within quarter mile of each grid square. County-level job counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Job locations are then determined using the WFRC/MAG Real Estate Market Model (an customized implementation of UrbanSim open source software) using county assessor tax parcel data together with generalized job data from the Department of Workforce Services as key of the model inputs.

 Nearby Residential Intensity (NRI):
 Households within Quarter Mile. County-level household counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Household locations are determined using the WFRC/MAG REM model using county assessor tax parcel data together with US Census population (block level) as key model inputs.

 Nearby Combined Intensity (NCI):
 Jobs plus scaled households within a quarter mile of each grid square. To give NEI and NRI equal weighting, the NRI household number is scaled by multiplying by 1,295,513 (total number of jobs in the region) and dividing by. 731,392 (the total number of households in the region)0

 Quarter mile grid square example graphic:

d
Blending Data for tracking sales from different sources
datarade.ai
Updated Sep 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Marketing (2023). Blending Data for tracking sales from different sources [Dataset]. https://datarade.ai/data-products/blending-data-for-tracking-sales-from-different-sources-growth-marketing
Explore at:
Dataset updated
Sep 18, 2023
Dataset authored and provided by
Growth Marketing
Area covered
Italy, Latvia, United States of America, Panama, Guernsey, Hungary, Gibraltar, Holy See, Poland, Mexico
Description
Create a custom pipeline using enterprise data integrators which provides high consistency, reliability and scalability of data.

The process to get this is easy: - analyze data sources (max 2) - create a sample output based on your expectations/needs - start creating the pipeline: our internal workflow that provides the output defined - check and refine
Covid - behavioral effects international datasets
zenodo.org
bin, csv, pdf
Updated Jun 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Lugtig; Peter Lugtig; Moretti Angelo; Moretti Angelo; Katharina Meitinger; Florian Metwaly; Merlin Urbanski; Su Li; Katharina Meitinger; Florian Metwaly; Merlin Urbanski; Su Li (2024). Covid - behavioral effects international datasets [Dataset]. http://doi.org/10.5281/zenodo.12601334
Explore at:
bin, pdf, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12601334
Dataset updated
Jun 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Lugtig; Peter Lugtig; Moretti Angelo; Moretti Angelo; Katharina Meitinger; Florian Metwaly; Merlin Urbanski; Su Li; Katharina Meitinger; Florian Metwaly; Merlin Urbanski; Su Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset stems from the project ‘Beprepared’: (https://be-prepared-consortium.nl/) which aims to provide in-depth analyses of mixed-method behavioural science data collected throughout the unprecedented COVID-19 pandemic and inform preparedness strategies for future outbreaks. In approaching the research from a behavioural and social science perspective, researchers focus on four main themes:

· Prevention behaviour, psychosocial and contextual determinants, and (communication) interventions

· Resilience and engagement of citizens, communities and organisations

· Research methodology and preparedness

· Effective and integrated policy advice

This resource links to the theme ‘research methodology’ and provides an overview of datasets that have been used internationally to study the behavioral effects of the Covid-19 pandemic. These datasources can be used to study how people behave in a variety of settings during the Covid pandemic and so to inform policy-makers, but also to study the effects of behavioral interventions. It includes datasources that for example study mobility behavior at a regional or national level, physical distancing in public, health adherence behaviors (like handwashing, mask wearing), social contacts on- and offline, purchasing behaviors (shopping) etc.

The resource consists of two datasets:

1. A dataset (in .xlsx and .csv format) of the search strategy used to come to the list of datasources called “search strategy”

2. A dataset (in .xslx and .csv format) of the results of the search, called “search results”

In future, a third dataset will be added, in which the quality of a large subset of datasets from the search will be systematically assessed on their data quality.
Z
SCAR Southern Ocean Diet and Energetics Database
data.niaid.nih.gov
data.aad.gov.au
+3more
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scientific Committee on Antarctic Research (2023). SCAR Southern Ocean Diet and Energetics Database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5072527
Explore at:
Dataset updated
Jul 24, 2023
Dataset authored and provided by
Scientific Committee on Antarctic Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Southern Ocean
Description
Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.

Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.

A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.

Notes on names: Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.

Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.

There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.

Data table schemas

Sources data table

SOURCE_ID: The unique identifier of this source

DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")

NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract

DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"

Diet data table

RECORD_ID: The unique identifier of this record

SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)

SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience

ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one

LOCATION: The name of the location at which the data was collected

WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)

EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)

SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)

NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)

ALTITUDE_MIN: The minimum altitude of the sampling region, in metres

ALTITUDE_MAX: The maximum altitude of the sampling region, in metres

DEPTH_MIN: The shallowest depth of the sampling, in metres

DEPTH_MAX: The deepest depth of the sampling, in metres

OBSERVATION_DATE_START: The start of the sampling period

OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)

PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names

PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source

PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register

PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register

PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)

PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)

PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"

PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"

PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed

PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).

PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample

PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")

PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")

PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample

PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")

PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents

PREY_NAME: The scientific name of the prey item (corrected, if necessary)

PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source

PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register

PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register

PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)

PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses

PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")

PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"

PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of
Climate Change: Earth Surface Temperature Data
kaggle.com
redivis.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
Explore at:
zip(88843537 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset authored and provided by
Berkeley Earthhttp://berkeleyearth.org/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Earth
Description
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have include several files:

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures

LandAverageTemperature: global average land temperature in celsius

LandAverageTemperatureUncertainty: the 95% confidence interval around the average

LandMaxTemperature: global average maximum land temperature in celsius

LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature

LandMinTemperature: global average minimum land temperature in celsius

LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature

LandAndOceanAverageTemperature: global average land and ocean temperature in celsius

LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

Other files include:

Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)

Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)

Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)

Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

The raw data comes from the Berkeley Earth data page.
S
Sharing worker management practice: Theoretical construction and empirical...
scidb.cn
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
gu yin hua (2024). Sharing worker management practice: Theoretical construction and empirical test [Dataset]. http://doi.org/10.57760/sciencedb.psych.00168
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.psych.00168
Dataset updated
Jul 3, 2024
Dataset provided by
Science Data Bank
Authors
gu yin hua
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
【 Study 1 】Shared employee management practice is the policy and system implemented by sharing platform enterprises for shared employees. Therefore, the construction of a shared employee management practice system should collect data from two aspects: shared platform enterprises and shared employees. This study collected information on what shared platform enterprises have done through official apps and websites, and collected information on what shared employees feel the platform has done through participatory observation and interviews. Specifically, the data collection for the Didi travel platform mainly comes from three channels: the official APP passenger end, the official APP driver end, and interviews; The data collection for the Sichuan pepper live streaming platform mainly comes from three channels: official apps and websites, participatory observation, and interviews. For the convenience of data analysis, this study encoded various data sources. Firstly, encode the data from Didi Chuxing platform as "A" and the data from Huajiao platform as "B"; Secondly, code the data from the official APP as "A", and if there are multiple APPs, code them as "A01", "A02", etc. respectively; Thirdly, encode the data from the official website as "OW"; Fourthly, encode the data from participatory observation as "PO"; Fifthly, encode the data from the interview as "IM". When encoding, continuously number data from the same source. For example, the codes for the first and second codes extracted from the participatory observation data of the Sichuan pepper live streaming platform are "YiPO-01" and "YiPO-02", respectively. 【 Study 2 】 A face-to-face interview questionnaire survey was conducted among 221 ride hailing drivers in Shanghai and Chengdu, with 8 respondents. The effective questionnaire rate was 100%. The research process is divided into six steps: (1) Design a questionnaire on "Questionnaire Star" and send the link to the researchers (2 people in each group); (2) Train researchers, focusing on explaining the implementation rules and safety hazards during the research process; (3) Make an appointment for the survey subject, book an online ride hailing service through the passenger end of the Didi Chuxing APP, present identification documents to the ride hailing driver, inform them of the research purpose and payment method, and prepare for the survey with the support of the ride hailing driver; (4) Communicate research methods, read out research guidelines, and inform the research process: Firstly, Investigator A reads each question item (including the question stem and options), then asks ride hailing drivers to choose one of the five options from "strongly disagree" to "strongly agree", and finally, Investigator B is responsible for filling out the questionnaire (while A supervises); (5) Conduct research; (6) Pay based on local starting price and duration. 【 Study 3 】 This study conducted a questionnaire survey on 273 ride hailing drivers in Shanghai and Chengdu using face-to-face interviews (Feng Xiaotian, 2009) with 8 respondents. The effective questionnaire rate was 100%. The research process and implementation details are the same as Study 2, and will not be repeated here.
o
Data from: A consensus compound/bioactivity dataset for data-driven drug...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Mar 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Isigkeit; Apirat Chaikuad; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6398019
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6398019
Dataset updated
Mar 2, 2022
Authors
Laura Isigkeit; Apirat Chaikuad; Daniel Merk
Description
This is the updated version of the dataset from 10.5281/zenodo.6320761 Information The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144648 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design. The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation. This dataset belongs to the publication: https://doi.org/10.3390/molecules27082513 Structure and content of the dataset Dataset structure ChEMBL ID PubChem ID IUPHAR ID Target Activity type Assay type Unit Mean C (0) ... Mean PC (0) ... Mean B (0) ... Mean I (0) ... Mean PD (0) ... Activity check annotation Ligand names Canonical SMILES C ... Structure check (Tanimoto) Source The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file. Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format. Column content: ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases Target: biological target of the molecule expressed as the HGNC gene symbol Activity type: for example, pIC50 Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified Unit: unit of bioactivity measurement Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence no comment: bioactivity values are within one log unit; check activity data: bioactivity values are not within one log unit; only one data point: only one value was available, no comparison and no range calculated; no activity value: no precise numeric activity value was available; no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration Ligand names: all unique names contained in the five source databases are listed Canonical SMILES columns: Molecular structure of the compound from each database Structure check (Tanimoto): To denote matching or differing compound structures in different source databases match: molecule structures are the same between different sources; no match: the structures differ. We calculated the Jaccard-Tanimoto similarity coefficient from Morgan Fingerprints to reveal true differences between sources and reported the minimum value; 1 structure: no structure comparison is possible, because there was only one structure available; no structure: no structure comparison is possible, because there was no structure available. Source: From which databases the data come from
a
Climate Ready Boston Social Vulnerability
hub.arcgis.com
data.boston.gov
+3more
Updated Sep 21, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BostonMaps (2017). Climate Ready Boston Social Vulnerability [Dataset]. https://hub.arcgis.com/datasets/34f2c48b670d4b43a617b1540f20efe3
Explore at:
Dataset updated
Sep 21, 2017
Dataset authored and provided by
BostonMaps
Area covered

Description
Social vulnerability is defined as the disproportionate susceptibility of some social groups to the impacts of hazards, including death, injury, loss, or disruption of livelihood. In this dataset from Climate Ready Boston, groups identified as being more vulnerable are older adults, children, people of color, people with limited English proficiency, people with low or no incomes, people with disabilities, and people with medical illnesses. Source:The analysis and definitions used in Climate Ready Boston (2016) are based on "A framework to understand the relationship between social factors that reduce resilience in cities: Application to the City of Boston." Published 2015 in the International Journal of Disaster Risk Reduction by Atyia Martin, Northeastern University.Population Definitions:Older Adults:Older adults (those over age 65) have physical vulnerabilities in a climate event; they suffer from higher rates of medical illness than the rest of the population and can have some functional limitations in an evacuation scenario, as well as when preparing for and recovering from a disaster. Furthermore, older adults are physically more vulnerable to the impacts of extreme heat. Beyond the physical risk, older adults are more likely to be socially isolated. Without an appropriate support network, an initially small risk could be exacerbated if an older adult is not able to get help.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for population over 65 years of age.Attribute label: OlderAdultChildren: Families with children require additional resources in a climate event. When school is cancelled, parents need alternative childcare options, which can mean missing work. Children are especially vulnerable to extreme heat and stress following a natural disaster.Data source: 2010 American Community Survey 5-year Estimates (ACS) data by census tract for population under 5 years of age.Attribute label: TotChildPeople of Color: People of color make up a majority (53 percent) of Boston’s population. People of color are more likely to fall into multiple vulnerable groups aswell. People of color statistically have lower levels of income and higher levels of poverty than the population at large. People of color, many of whom also have limited English proficiency, may not have ready access in their primary language to information about the dangers of extreme heat or about cooling center resources. This risk to extreme heat can be compounded by the fact that people of color often live in more densely populated urban areas that are at higher risk for heat exposure due to the urban heat island effect.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract: Black, Native American, Asian, Island, Other, Multi, Non-white Hispanics.Attribute label: POC2Limited English Proficiency: Without adequate English skills, residents can miss crucial information on how to preparefor hazards. Cultural practices for information sharing, for example, may focus on word-of-mouth communication. In a flood event, residents can also face challenges communicating with emergency response personnel. If residents are more sociallyisolated, they may be less likely to hear about upcoming events. Finally, immigrants, especially ones who are undocumented, may be reluctant to use government services out of fear of deportation or general distrust of the government or emergency personnel.Data Source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract, defined as speaks English only or speaks English “very well”.Attribute label: LEPLow to no Income: A lack of financial resources impacts a household’s ability to prepare for a disaster event and to support friends and neighborhoods. For example, residents without televisions, computers, or data-driven mobile phones may face challenges getting news about hazards or recovery resources. Renters may have trouble finding and paying deposits for replacement housing if their residence is impacted by flooding. Homeowners may be less able to afford insurance that will cover flood damage. Having low or no income can create difficulty evacuating in a disaster event because of a higher reliance on public transportation. If unable to evacuate, residents may be more at risk without supplies to stay in their homes for an extended period of time. Low- and no-income residents can also be more vulnerable to hot weather if running air conditioning or fans puts utility costs out of reach.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for low-to- no income populations. The data represents a calculated field that combines people who were 100% below the poverty level and those who were 100–149% of the poverty level.Attribute label: Low_to_NoPeople with Disabilities: People with disabilities are among the most vulnerable in an emergency; they sustain disproportionate rates of illness, injury, and death in disaster events.46 People with disabilities can find it difficult to adequately prepare for a disaster event, including moving to a safer place. They are more likely to be left behind or abandoned during evacuations. Rescue and relief resources—like emergency transportation or shelters, for example— may not be universally accessible. Research has revealed a historic pattern of discrimination against people with disabilities in times of resource scarcity, like after a major storm and flood.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for total civilian non-institutionalized population, including: hearing difficulty, vision difficulty, cognitive difficulty, ambulatory difficulty, self-care difficulty, and independent living difficulty. Attribute label: TotDisMedical Illness: Symptoms of existing medical illnesses are often exacerbated by hot temperatures. For example, heat can trigger asthma attacks or increase already high blood pressure due to the stress of high temperatures put on the body. Climate events can interrupt access to normal sources of healthcare and even life-sustaining medication. Special planning is required for people experiencing medical illness. For example, people dependent on dialysis will have different evacuation and care needs than other Boston residents in a climate event.Data source: Medical illness is a proxy measure which is based on EASI data accessed through Simply Map. Health data at the local level in Massachusetts is not available beyond zip codes. EASI modeled the health statistics for the U.S. population based upon age, sex, and race probabilities using U.S. Census Bureau data. The probabilities are modeled against the census and current year and five year forecasts. Medical illness is the sum of asthma in children, asthma in adults, heart disease, emphysema, bronchitis, cancer, diabetes, kidney disease, and liver disease. A limitation is that these numbers may be over-counted as the result of people potentially having more than one medical illness. Therefore, the analysis may have greater numbers of people with medical illness within census tracts than actually present. Overall, the analysis was based on the relationship between social factors.Attribute label: MedIllnesOther attribute definitions:GEOID10: Geographic identifier: State Code (25), Country Code (025), 2010 Census TractAREA_SQFT: Tract area (in square feet)AREA_ACRES: Tract area (in acres)POP100_RE: Tract population countHU100_RE: Tract housing unit countName: Boston Neighborhood
Multi-aspect Integrated Migration Indicators (MIMI) dataset
zenodo.org
explore.openaire.eu
+1more
csv
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diletta Goglia; Diletta Goglia (2025). Multi-aspect Integrated Migration Indicators (MIMI) dataset [Dataset]. http://doi.org/10.5281/zenodo.6493325
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6493325
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Diletta Goglia; Diletta Goglia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to find an original methodology to answer open questions about cross-border human mobility. The Multi-aspect Integrated Migration Indicators (MIMI) dataset is a new dataset to be exploited in migration studies as a concrete example of this new approach. It includes both official data about bidirectional human migration (traditional flow and stock data) with multidisciplinary variables and original indicators, including economic, demographic, cultural and geographic indicators, together with the Facebook Social Connectedness Index (SCI). It is built by gathering, embedding and integrating traditional and novel variables, resulting in this new multidisciplinary dataset that could significantly contribute to nowcast/forecast bilateral migration trends and migration drivers.

Thanks to this variety of knowledge, experts from several research fields (demographers, sociologists, economists) could exploit MIMI to investigate the trends in the various indicators, and the relationship among them. Moreover, it could be possible to develop complex models based on these data, able to assess human migration by evaluating related interdisciplinary drivers, as well as models able to nowcast and predict traditional migration indicators in accordance with original variables, such as the strength of social connectivity. Here, the SCI could have an important role. It measures the relative probability that two individuals across two countries are friends with each other on Facebook, therefore it could be employed as a proxy of social connections across borders, to be studied as a possible driver of migration.

All in all, the motivations for building and releasing the MIMI dataset lie in the need of new perspectives, methods and analyses that can no longer prescind from taking into account a variety of new factors. The heterogeneous and multidimensional sets of data present in MIMI offer an all-encompassing overview of the characteristics of human migration, enabling a better understanding and an original potential exploration of the relationship between migration and non-traditional sources of data.

The MIMI dataset is made up of one single CSV file that includes 28,821 rows (records/entries) and 876 columns (variables/features/indicators). Each row is identified uniquely by a pairs of countries, built from the joining of the two ISO-3166 alpha-2 codes for the origin and destination country, respectively. The dataset contains as main features the country-to-country bilateral migration flows and stocks, together with multidisciplinary variables measuring cultural, demographic, geographic and economic variables for the two countries, together with the Facebook strength of connectedness of each pair.

Related paper: Goglia, D., Pollacci, L., Sirbu, A. (2022). Dataset of Multi-aspect Integrated Migration Indicators. https://doi.org/10.5281/zenodo.6500885
d
GapMaps Live Location Intelligence Platform | GIS Data | Easy-to-use| One...
datarade.ai
.csv
Updated Aug 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GapMaps (2024). GapMaps Live Location Intelligence Platform | GIS Data | Easy-to-use| One Login for Global access [Dataset]. https://datarade.ai/data-products/gapmaps-live-location-intelligence-platform-gis-data-easy-gapmaps
Explore at:
.csvAvailable download formats
Dataset updated
Aug 14, 2024
Dataset authored and provided by
GapMaps
Area covered
Taiwan, United States of America, Nigeria, Malaysia, Kenya, United Arab Emirates, Egypt, Philippines, Saudi Arabia, Thailand
Description
GapMaps Live is an easy-to-use location intelligence platform available across 25 countries globally that allows you to visualise your own store data, combined with the latest demographic, economic and population movement intel right down to the micro level so you can make faster, smarter and surer decisions when planning your network growth strategy.

With one single login, you can access the latest estimates on resident and worker populations, census metrics (eg. age, income, ethnicity), consuming class, retail spend insights and point-of-interest data across a range of categories including fast food, cafe, fitness, supermarket/grocery and more.

Some of the world's biggest brands including McDonalds, Subway, Burger King, Anytime Fitness and Dominos use GapMaps Live as a vital strategic tool where business success relies on up-to-date, easy to understand, location intel that can power business case validation and drive rapid decision making.

Primary Use Cases for GapMaps Live includes:

Retail Site Selection - Identify optimal locations for future expansion and benchmark performance across existing locations.

Customer Profiling: get a detailed understanding of the demographic profile of your customers and where to find more of them.

Analyse your catchment areas at a granular grid levels using all the key metrics

Target Marketing: Develop effective marketing strategies to acquire more customers.

Marketing / Advertising (Billboards/OOH, Marketing Agencies, Indoor Screens)

Customer Profiling

Target Marketing

Market Share Analysis

Some of features our clients love about GapMaps Live include: - View business locations, competitor locations, demographic, economic and social data around your business or selected location - Understand consumer visitation patterns (“where from” and “where to”), frequency of visits, dwell time of visits, profiles of consumers and much more. - Save searched locations and drop pins - Turn on/off all location listings by category - View and filter data by metadata tags, for example hours of operation, contact details, services provided - Combine public data in GapMaps with views of private data Layers - View data in layers to understand impact of different data Sources - Share maps with teams - Generate demographic reports and comparative analyses on different locations based on drive time, walk time or radius. - Access multiple countries and brands with a single logon - Access multiple brands under a parent login - Capture field data such as photos, notes and documents using GapMaps Connect and integrate with GapMaps Live to get detailed insights on existing and proposed store locations.
H
Benchmarking - Raw Source Data
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Benchmarking - Raw Source Data [Dataset]. http://doi.org/10.7910/DVN/JKDONM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/JKDONM
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains raw, unprocessed data files pertaining to the management tool 'Benchmarking'. The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "benchmarking" + "benchmarking management" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Benchmarking Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: "benchmarking" AND ("process" OR "management" OR "performance" OR "best practices" OR "implementation" OR "approach" OR "evaluation" OR "methodology") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Benchmarking (1993, 1996, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 1994, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1996/784; 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Benchmarking (1993, 1996, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 1994, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1996/784; 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.
BigQuery Sample Tables
kaggle.com
zip
Updated Sep 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2018). BigQuery Sample Tables [Dataset]. https://www.kaggle.com/bigquery/samples
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 4, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

BigQuery provides a limited number of sample tables that you can run queries against. These tables are suited for testing queries and learning BigQuery.

Content

gsod: Contains weather information collected by NOAA, such as precipitation amounts and wind speeds from late 1929 to early 2010.

github_nested: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a nested schema. Created in September 2012.

github_timeline: Contains a timeline of actions such as pull requests and comments on GitHub repositories with a flat schema. Created in May 2012.

natality: Describes all United States births registered in the 50 States, the District of Columbia, and New York City from 1969 to 2008.

shakespeare: Contains a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.

trigrams: Contains English language trigrams from a sample of works published between 1520 and 2008.

wikipedia: Contains the complete revision history for all Wikipedia articles up to April 2010.

Fork this kernel to get started.

Acknowledgements

Data Source: https://cloud.google.com/bigquery/sample-tables

Banner Photo by Mervyn Chan from Unplash.

Inspiration

How many babies were born in New York City on Christmas Day?

How many words are in the play Hamlet?
H
Customer Segmentation - Raw Source Data
dataverse.harvard.edu
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomar Anez; Dimar Anez (2025). Customer Segmentation - Raw Source Data [Dataset]. http://doi.org/10.7910/DVN/0NS2KB
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/0NS2KB
Dataset updated
May 6, 2025
Dataset provided by
Harvard Dataverse
Authors
Diomar Anez; Dimar Anez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains raw, unprocessed data files pertaining to the management tool 'Customer Segmentation', including the closely related concept of Market Segmentation. The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "customer segmentation" + "market segmentation" + "customer segmentation marketing" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Customer Segmentation + Market Segmentation Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: ("customer segmentation" OR "market segmentation") AND ("marketing" OR "strategy" OR "management" OR "targeting" OR "analysis" OR "approach" OR "practice") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Customer Segmentation (1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Customer Segmentation (1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.
d
Factori USA Consumer Graph Data | socio-demographic, location, interest and...
datarade.ai
.json, .csv
Updated Jul 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori (2022). Factori USA Consumer Graph Data | socio-demographic, location, interest and intent data | E-Commere |Mobile Apps | Online Services [Dataset]. https://datarade.ai/data-products/factori-usa-consumer-graph-data-socio-demographic-location-factori
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jul 23, 2022
Dataset authored and provided by
Factori
Area covered
United States of America
Description
Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

Geography - City, State, ZIP, County, CBSA, Census Tract, etc.

Demographics - Gender, Age Group, Marital Status, Language etc.

Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc

Persona - Consumer type, Communication preferences, Family type, etc

Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.

Household - Number of Children, Number of Adults, IP Address, etc.

Behaviours - Brand Affinity, App Usage, Web Browsing etc.

Firmographics - Industry, Company, Occupation, Revenue, etc

Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.

Auto - Car Make, Model, Type, Year, etc.

Housing - Home type, Home value, Renter/Owner, Year Built etc.

Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

Consumer Graph Use Cases:

360-Degree Customer View:Get a comprehensive image of customers by the means of internal and external data aggregation.

Data Enrichment:Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment

Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity.

Advertising & Marketing:Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

Using Factori Consumer Data graph you can solve use cases like:

Acquisition Marketing Expand your reach to new users and customers using lookalike modeling with your first party audiences to extend to other potential consumers with similar traits and attributes.

Lookalike Modeling

Build lookalike audience segments using your first party audiences as a seed to extend your reach for running marketing campaigns to acquire new users or customers

And also, CRM Data Enrichment, Consumer Data Enrichment B2B Data Enrichment B2C Data Enrichment Customer Acquisition Audience Segmentation 360-Degree Customer View Consumer Profiling Consumer Behaviour Data
c
New Arenas for Civic Expansion: Humans, Animals and Artificial Intelligence,...
datacatalogue.cessda.eu
beta.ukdataservice.ac.uk
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaney, P; Jones, I; Fevre, R; Kathleen, J; Nivedita, N (2025). New Arenas for Civic Expansion: Humans, Animals and Artificial Intelligence, 2020-2024 [Dataset]. http://doi.org/10.5255/UKDA-SN-857597
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-857597
Dataset updated
Jun 13, 2025
Dataset provided by
Cardiff University
Authors
Chaney, P; Jones, I; Fevre, R; Kathleen, J; Nivedita, N
Time period covered
Jun 1, 2020 - Sep 30, 2024
Area covered
United Kingdom, India
Variables measured
Individual, Organization
Measurement technique
This collection contains 68 semi-structured interviews with 67 participants. Some participants were interviewed twice. 13 of these interviews were conducted about animal rights in India, 25 about animal rights in the United Kingdom, and a further 30 about artificial intelligence in the United Kingdom. Purposive sampling provided a cross-section of interviewees from different Non-Governmental and Civil Society Organisations, judged by organisation characteristics such as size, wealth, location, number of employees and specialism. The interviews were conducted either remotely or face-to-face.
Description
This project involved cross-national qualitative research which explored what factors shape individualism, and human and non-human relations in civil society, with reference to animal rights and welfare, and artificial intelligence. Interviews were carried out to explore the framing of animal rights and animal welfare in Civil Society Organisations’ advocacy and campaigning materials in order to understand how they express and reflect civil society views on animal rights and animal welfare. We specifically explored how they seek to recast and challenge traditional conceptions of civil society to take fuller account of human and non-human relations. For animal rights and artificial intelligence, interviews were conducted in Civil Society Organisations in the United Kingdom. Further interviews were conducted on animal rights in Non-Governmental Organisations in India.
WISERD celebrates its 10th anniversary this year. Over time it has grown into an international research institute that develops the next generation of research leaders. Our research brings together different disciplines (geographers, economists, sociologists, data scientists, political scientists) to address important issues for civil society at national and international levels. Our social science core provides a strong foundation for working with other disciplines including environmental science, engineering and medicine to transform our understanding and approaches to key areas of public concern. Our aim is to provide evidence that informs and changes policy and practice. This Centre will build on all previous WISERD research activities to undertake an ambitious new research programme. Our focus will be on the concept of civic stratification. This is a way of looking at divisions in society by focusing on the rights and obligations and practices of citizens and the role of civil society organisations in addressing inequalities in those rights and obligations. We will examine and analyse instances where people do not have the same rights as others (for example people who are migrants or refugees). We will also look at examples of people and groups working together within civil society to win new rights; this is referred to as civic expansion. Examples might include campaigns for animal rights or concerns about robots and Artificial Intelligence. We will investigate situations where people have the same rights but experience differences in their ability to access those rights; sometimes referred to as civic gain and civic loss (for example some people are better able to access legal services than others). Lastly, we will explore how individuals and groups come together to overcome deficits in their rights and citizenship; sometimes referred to as forms of civil repair. This might include ways in which people are looking at alternative forms of economic organisation, at local sustainability and at using new technologies (platforms and software) to organise and campaign for their rights. Our centre will deliver across four key areas of activity. First our research programme will focus on themes that address the different aspects of civic stratification. We will examine trends in polarization of economic, political and social rights, looking at how campaigns for rights are changing and undertaking case studies of attempts to repair the fabric of civil life. Second, we will extend and deepen our international and civil society research partnerships and networks and by doing so strengthen our foundations for developing further joint research in the future. Third, we will implement an exciting and accessible 'knowledge exchange' programme to enable our research and evidence to reach, involve and influence as many people as possible. Fourth, we will expand the capacity of social science research and nurture future research leaders. All our research projects will be jointly undertaken with key partners including civil society organisations, such as charities, and local communities. The research programme is broad and will include the collection of new data, the exploitation of existing data sources and linking existing sets of data. The data will range from local detailed studies to large cross-national comparisons. We will make the most of our skills and abilities to work with major RCUK research investments. We have an outstanding track record in maximising research impact, in applying a wide range of research methods to real world problems. This exciting and challenging research programme is based on a unique, long standing and supportive relationship between five core universities in Wales and our partnerships with universities and research institutes in the UK and internationally. It addresses priority areas identified by the ESRC and by governments and is informed by our continued close links with civil society organisations.
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
A
Alternative Data Solution Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Alternative Data Solution Report [Dataset]. https://www.marketresearchforecast.com/reports/alternative-data-solution-28310
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 6, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The alternative data solutions market, valued at $2,882.2 million in 2025, is experiencing robust growth driven by the increasing need for enhanced investment strategies and improved business decision-making. The rising adoption of data analytics and machine learning across various sectors, including BFSI, retail & logistics, and IT & telecommunications, fuels this expansion. Credit card transactions and web traffic currently represent significant data sources, though mobile application usage is rapidly gaining traction. While data privacy regulations present a challenge, the market's resilience is evident in the diverse range of alternative data providers, including established players like Equifax and emerging companies like Alternative Data Group and FinScience, constantly innovating to meet evolving market demands. The market's segmentation by application and data type reflects the versatility of alternative data, catering to specific industry needs. For example, BFSI institutions leverage alternative data for credit scoring and fraud detection, while retail and logistics firms use it for supply chain optimization and customer behavior analysis. Geographic distribution shows strong growth potential across North America and Europe, with Asia-Pacific emerging as a key region for future expansion. This growth is fuelled by increasing digitalization and the proliferation of data sources in these regions. The forecast period (2025-2033) anticipates sustained growth, propelled by technological advancements and the growing recognition of alternative data’s value in unlocking actionable insights. The competitive landscape is dynamic, with both established players and agile startups contributing to market innovation. Companies are continuously developing sophisticated analytical tools and expanding their data sources to offer comprehensive solutions. Furthermore, partnerships and collaborations between data providers and technology companies are further accelerating market growth. The continuous evolution of data analytics techniques and the increasing sophistication of AI-driven insights further contribute to market expansion. The market is expected to consolidate somewhat in the coming years, with larger players potentially acquiring smaller, more specialized firms to broaden their data offerings and expand their market reach. This market growth, coupled with ongoing innovation, positions alternative data solutions as a crucial element of modern business intelligence.
d
Protected Areas Database of the United States (PAD-US) 1.4
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 1.4 [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-1-4
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 2.0 https://doi.org/10.5066/P955KPLE. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public open space and voluntarily provided, private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastral Theme (http://www.fgdc.gov/ngda-reports/NGDA_Datasets.html). PAD-US is an ongoing project with several published versions of a spatial database of areas dedicated to the preservation of biological diversity, and other natural, recreational or cultural uses, managed for these purposes through legal or other effective means. The geodatabase maps and describes public open space and other protected areas. Most areas are public lands owned in fee; however, long-term easements, leases, and agreements or administrative designations documented in agency management plans may be included. The PAD-US database strives to be a complete “best available” inventory of protected areas (lands and waters) including data provided by managing agencies and organizations. The dataset is built in collaboration with several partners and data providers (http://gapanalysis.usgs.gov/padus/stewards/). See Supplemental Information Section of this metadata record for more information on partnerships and links to major partner organizations. As this dataset is a compilation of many data sets; data completeness, accuracy, and scale may vary. Federal and state data are generally complete, while local government and private protected area coverage is about 50% complete, and depends on data management capacity in the state. For completeness estimates by state: http://www.protectedlands.net/partners. As the federal and state data are reasonably complete; focus is shifting to completing the inventory of local gov and voluntarily provided, private protected areas. The PAD-US geodatabase contains over twenty-five attributes and four feature classes to support data management, queries, web mapping services and analyses: Marine Protected Areas (MPA), Fee, Easements and Combined. The data contained in the MPA Feature class are provided directly by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas Center (MPA, http://marineprotectedareas.noaa.gov ) tracking the National Marine Protected Areas System. The Easements feature class contains data provided directly from the National Conservation Easement Database (NCED, http://conservationeasement.us ) The MPA and Easement feature classes contain some attributes unique to the sole source databases tracking them (e.g. Easement Holder Name from NCED, Protection Level from NOAA MPA Inventory). The "Combined" feature class integrates all fee, easement and MPA features as the best available national inventory of protected areas in the standard PAD-US framework. In addition to geographic boundaries, PAD-US describes the protection mechanism category (e.g. fee, easement, designation, other), owner and managing agency, designation type, unit name, area, public access and state name in a suite of standardized fields. An informative set of references (i.e. Aggregator Source, GIS Source, GIS Source Date) and "local" or source data fields provide a transparent link between standardized PAD-US fields and information from authoritative data sources. The areas in PAD-US are also assigned conservation measures that assess management intent to permanently protect biological diversity: the nationally relevant "GAP Status Code" and global "IUCN Category" standard. A wealth of attributes facilitates a wide variety of data analyses and creates a context for data to be used at local, regional, state, national and international scales. More information about specific updates and changes to this PAD-US version can be found in the Data Quality Information section of this metadata record as well as on the PAD-US website, http://gapanalysis.usgs.gov/padus/data/history/.) Due to the completeness and complexity of these data, it is highly recommended to review the Supplemental Information Section of the metadata record as well as the Data Use Constraints, to better understand data partnerships as well as see tips and ideas of appropriate uses of the data and how to parse out the data that you are looking for. For more information regarding the PAD-US dataset please visit, http://gapanalysis.usgs.gov/padus/. To find more data resources as well as view example analysis performed using PAD-US data visit, http://gapanalysis.usgs.gov/padus/resources/. The PAD-US dataset and data standard are compiled and maintained by the USGS Gap Analysis Program, http://gapanalysis.usgs.gov/ . For more information about data standards and how the data are aggregated please review the “Standards and Methods Manual for PAD-US,” http://gapanalysis.usgs.gov/padus/data/standards/ .
Enterprise Survey 2009 - Samoa
microdata.worldbank.org
microdata.pacificdata.org
+3more
Updated Sep 26, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2013). Enterprise Survey 2009 - Samoa [Dataset]. https://microdata.worldbank.org/index.php/catalog/343
Explore at:
Dataset updated
Sep 26, 2013
Dataset authored and provided by
World Bankhttp://worldbank.org/
Time period covered
2009
Area covered
Samoa
Description
Abstract

This research is an Indicator Survey conducted in Samoa from May 25 to Oct. 9, 2009, as part of the Enterprise Survey initiative. An Indicator Survey, which is similar to an Enterprise Survey, is implemented for smaller economies where the sampling strategies inherent in an Enterprise Survey are often not applicable due to the limited universe of firms.

The objective of the survey is to obtain feedback from enterprises on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.

Questionnaire topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, land and permits, taxation, business-government relations, and performance measures.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Samoa was selected using stratified random sampling. Two levels of stratification were used in this country: industry and establishment size.

Industry stratification was designed in the way that follows: the universe was stratified into 23 manufacturing industries, and one services sector.

Size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

Regional stratification did not take place as only the island of Upolu, containing the capital city of Apia, was surveyed. Of the two islands that make up the majority of Samoa, Upolu has the largest population.

Due to limited data sources available in Samoa on registered businesses, the final sample frame was obtained from a combined dataset obtained from the Samoa National Provident Fund (SNPF). The list provided by the SNPF was limited to including information on the sector and location of enterprises, with no details on the number of employees. Therefore, original sample counts were not able to be stratified by enterprise size. The combined sample frame was than reviewed and duplicate establishments or establishments with ineligible characteristics (industry sector, number of employees, geographic location) removed from the list. The modified sample frame was used to select the sample of establishments for the full survey. This database contained the following information: -Name of the firm -Contact details -Location -ISIC code.

Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 50% (416 out of 835 establishments). Breaking down by industry, the following numbers of establishments were surveyed: Manufacturing - 24, Services - 85.

Mode of data collection

Face-to-face [f2f]

Research instrument

The current survey instruments are available: - Services Questionnaire - Manufacturing Questionnaire - Screener Questionnaire.

The Services Questionnaire is administered to the establishments in the services sector. The Manufacturing Questionnaire is built upon the Services Questionnaire and adds specific questions relevant to manufacturing.

The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country’s business environment. The remaining questions assess the survey respondents’ opinions on what are the obstacles to firm growth and performance.

Cleaning operations

Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.

Response rate

Complete information regarding the sampling methodology, sample frame, weights, response rates, and implementation can be found in "Description of Samoa Implementation 2009" in "Technical Documents" folder.

Facebook

Twitter

Click to copy link

Link copied

Cite

BoldData (2020). Firmographic Data on all 300 million businesses worldwide in single Dataset [Dataset]. https://datarade.ai/data-products/data-cleansing-bolddata

Firmographic Data on all 300 million businesses worldwide in single Dataset

Explore at:

.csv, .xls, .json, .txtAvailable download formats

Dataset updated

Oct 28, 2020

Dataset authored and provided by

BoldData

Area covered

Azerbaijan, Kenya, Suriname, Guam, Romania, Canada, Palau, Iraq, Estonia, Montserrat

Description

Every single contact from our firmographic database with 341 million+ companies comes directly from local sources that you can trust and are GDPR proof. We can deliver 200 firmographics such as company size, industry, legal status, revenue, employee size, opening hours, geocodes, import / export. BoldData is the nr.1 supplier of firmographic data supplier because we make use of thousands of local data sources. Ask us for a quote!

Clear search

Close search

Google apps

Main menu

Firmographic Data on all 300 million businesses worldwide in single Dataset

Commute Source Intensity

Blending Data for tracking sales from different sources

Covid - behavioral effects international datasets

SCAR Southern Ocean Diet and Energetics Database

Climate Change: Earth Surface Temperature Data

Sharing worker management practice: Theoretical construction and empirical...

Data from: A consensus compound/bioactivity dataset for data-driven drug...

Climate Ready Boston Social Vulnerability

Multi-aspect Integrated Migration Indicators (MIMI) dataset

GapMaps Live Location Intelligence Platform | GIS Data | Easy-to-use| One...

Benchmarking - Raw Source Data

BigQuery Sample Tables

Context

Content

Acknowledgements

Inspiration

Customer Segmentation - Raw Source Data

Factori USA Consumer Graph Data | socio-demographic, location, interest and...

New Arenas for Civic Expansion: Humans, Animals and Artificial Intelligence,...

MIMIC-IV

Alternative Data Solution Report

Protected Areas Database of the United States (PAD-US) 1.4

Enterprise Survey 2009 - Samoa

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Firmographic Data on all 300 million businesses worldwide in single Dataset