100+ datasets found

Frequently leveraged external data sources for global enterprises 2020
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Frequently leveraged external data sources for global enterprises 2020 [Dataset]. https://www.statista.com/statistics/1235514/worldwide-popular-external-data-sources-companies/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Aug 2020
Area covered
Worldwide
Description
In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.

sEH External Data

kaggle.com

zip

Updated Jun 7, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

kyu999 (2024). sEH External Data [Dataset]. https://www.kaggle.com/datasets/kyu999/seh-external-data

Explore at:

zip(57251119 bytes)Available download formats

Dataset updated

Jun 7, 2024

Authors

kyu999

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

sEHの外部データに以下の処理を実施したDataset

DNA linkerをDyに置換
bindsカラム追加
is_triazineカラム追加
各buildingblockをidへ変換
molecule_smilesをcanonical formに変換(これは念の為)

Data Example

molecule_smiles	is_triazine	buildingblock1_id	buildingblock2_id	buildingblock3_id
"O=C(CCn1ccc(C(=O)N2CCN(c3ncncn…	true	3501	3844	3655
"CC(C)C[C@H](NC(=O)CCn1ccc(C(=O…	true	5933	3844	3655
"CN1CCC(CN([Dy])C(=O)CCn2ccc(C(…	true	4232	3844	3655
"Cc1c(F)ccc(CN([Dy])C(=O)CCn2cc…	true	2689	3844	3655
"O=C(CCn1ccc(C(=O)N2CCN(c3ncncn…	true	4909	3844	3655

DfE external data shares
gov.uk
s3.amazonaws.com
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2025). DfE external data shares [Dataset]. https://www.gov.uk/government/publications/dfe-external-data-shares
Explore at:
Dataset updated
Sep 11, 2025
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Education
Description
‘DfE external data shares’ includes:

classification of data – understanding the data we share

DfE external third-party data shares (approved by the Data Sharing Approvals Panel from May 2018 to 30 June 2025)

regular DfE external third-party data shares supported by appropriate data sharing agreements as of June 2025

third-party requests for data from the national pupil database (approved prior to April 2018 where data is still held by the requestor as at 30 June 2025)

data shares with Home Office

data shares with police and criminal investigation authorities

data shares through court orders

DfE also provides external access to data under https://www.legislation.gov.uk/ukpga/2017/30/section/64/enacted">Section 64, Chapter 5, of the Digital Economy Act 2017. Details of these data shares can be found in the https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research-statistics/better-useofdata-for-research-information-for-researchers/list-of-accredited-researchers-and-research-projects-under-the-research-strand-of-the-digital-economy-act/">UK Statistics Authority list of accredited projects.

Archive

Previous external data shares can be viewed in the https://webarchive.nationalarchives.gov.uk/ukgwa/timeline1/https://www.gov.uk/government/publications/dfe-external-data-shares">National Archives.

The data in the archived documents may not match DfE’s internal data request records due to definitions or business rules changing following process improvements.
PII | External Dataset
kaggle.com
zip
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
moth (2024). PII | External Dataset [Dataset]. https://www.kaggle.com/datasets/alejopaullier/pii-external-dataset
Explore at:
zip(7923518 bytes)Available download formats
Dataset updated
Jan 24, 2024
Authors
moth
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is an LLM-generated external dataset for the: - The Learning Agency Lab - PII Data Detection Competition

Versions

v2: Added +1000 texts with new PII information like URLs and usernames. Also, now the dataset includes the PII information as columns. Note that not all the PII information is included on the text on purpose.

Description

It contains 3382 4434 generated texts with their corresponding annotated labels in the required competition format.

Description: - document (str): ID of the essay - full_text (string): AI generated text. - tokens (string): a list with the tokens (comes from text.split()) - trailing_whitespace (list): a list with boolean values indicating whether each token is followed by whitespace. - labels (list): list with token labels in BIO format
f
Comparisons between the recruited sample and external data.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jul 9, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jones, Ruth; Watkins, Alan; Allen, Steven J.; Storey, Mel; Morgan, Gareth; Russell, Ian T.; Thornton, Catherine A.; Brooks, Caroline J.; Plummer, Sue F.; Heaven, Martin L.; Jordan, Sue; Garaiova, Iveta (2013). Comparisons between the recruited sample and external data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001716768
Explore at:
Dataset updated
Jul 9, 2013
Authors
Jones, Ruth; Watkins, Alan; Allen, Steven J.; Storey, Mel; Morgan, Gareth; Russell, Ian T.; Thornton, Catherine A.; Brooks, Caroline J.; Plummer, Sue F.; Heaven, Martin L.; Jordan, Sue; Garaiova, Iveta
Description
Notes to table: No correction taken for multiple comparisons.1Deprivation (Townsend) scores, ranks and fifths are based on geographical area of residence, using Lower Super Output Areas (LSOAs) defined by postcodes. This measure of material deprivation is calculated from rates of unemployment, vehicle ownership, home ownership, and overcrowding [49].2In five cases, both parents were students, and ONS categories could not be allocated. Fathers’ occupations taken where no occupation for mother [44], [49].3as reported by mothers at recruitment at 36 weeks’ pregnancy.4as in hospital records.5unequal sample sizes, unequal variances.
Data from: Big Data versus a Survey
clevelandfed.org
Updated Dec 31, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Reserve Bank of Cleveland (2014). Big Data versus a Survey [Dataset]. https://www.clevelandfed.org/publications/working-paper/2014/wp-1440-big-data-versus-a-survey
Explore at:
Dataset updated
Dec 31, 2014
Dataset authored and provided by
Federal Reserve Bank of Clevelandhttps://www.clevelandfed.org/
Description
Economists are shifting attention and resources from work on survey data towork on “big data.” This analysis is an empirical exploration of the trade-offs this transition requires. Parallel models are estimated using the Federal Reserve Bank of New York Consumer Credit Panel/Equifax and the Survey of Consumer Finances. After adjustments to account for different variable definitions and sampled populations, it is possible to arrive at similar models of total household debt. However, the estimates are sensitive to the adjustments. Little similarity is observed in parallel models of nonmortgage debt. While surveys intentionally collect theoretically related variables, it may be necessary to merge external data into commercial big data. In this example, some education and income measures are successfully integrated with the big data, but other external aggregates fail to adequately substitute for survey responses. Big data offers sample sizes, frequencies, and details that surveys cannot match. However, this example illustrates why caution is appropriate when attempting to substitute big data for a carefully executed survey.
f
Data from: Augmenting the Control Arm of Randomized Trials by Incorporating...
tandf.figshare.com
bin
Updated Oct 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xun Xu; Ying Yuan; J. Jack Lee (2025). Augmenting the Control Arm of Randomized Trials by Incorporating Multiple External Data Sources Using Propensity Score Stratification and Data-Driven Mixture Prior [Dataset]. http://doi.org/10.6084/m9.figshare.29951984.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29951984.v1
Dataset updated
Oct 10, 2025
Dataset provided by
Taylor & Francis
Authors
Xun Xu; Ying Yuan; J. Jack Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To enhance efficiency in drug development, interest in augmenting randomized controlled trials by supplementing the control arm with external data has grown rapidly. However, external data may lack between-population exchangeability. To facilitate proper information borrowing, we propose two two-stage strategies: the stratified propensity score self-adaptive mixture (SPS-SAM) prior and stratified propensity score calibrated elastic mixture (SPS-CEM) prior. The mixture prior is composed of an informative meta-analytic predictive (MAP) prior and a vague prior. In the first stage, propensity scores (PS) stratification is performed to select similar subjects from external data. Within each stratum, to mitigate the measured confounding, we calculate the PS overlap coefficient to account for the between-group heterogeneity by adjusting the hyperparameters of the MAP prior. In the second stage, to reduce unmeasured confounding and address potential prior-data conflict, we construct a data-driven mixture prior incorporating an adaptive weight that dynamically controls the proportion of the MAP prior. To obtain the adaptive weight measuring the extent of congruence between the current and the external data, SPS-SAM prior uses the likelihood ratio test and SPS-CEM prior uses the scaled t-test, respectively. Compared with existing methods, simulations studies and illustrative examples demonstrate the superior features of the proposed methods. Both proposed methods outperform existing methods by yielding smaller bias, greater calibrated power, and achieving accurate, efficient, and robust estimation of the treatment effect.
The Lick (External Examples, Non-strict)
kaggle.com
zip
Updated Jan 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Chamberlain (2022). The Lick (External Examples, Non-strict) [Dataset]. https://www.kaggle.com/datasets/andychamberlain/the-lick-external-examples-nonstrict
Explore at:
zip(81833193 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Andy Chamberlain
Description
Dataset

This dataset was created by Andy Chamberlain

Contents
p
Business Activity Survey 2009 - Samoa
microdata.pacificdata.org
Updated Jul 2, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samoa Bureau of Statistics (2019). Business Activity Survey 2009 - Samoa [Dataset]. https://microdata.pacificdata.org/index.php/catalog/253
Explore at:
Dataset updated
Jul 2, 2019
Dataset authored and provided by
Samoa Bureau of Statistics
Time period covered
2009
Area covered
Samoa
Description
Abstract

The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).

The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.

The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.

Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).

A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.

Geographic coverage

National Coverage

Analysis unit

The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.

SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.

It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.

Universe

The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).

Kind of data

Sample survey data [ssd]

Sampling procedure

-Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)

WILL CONFIRM LATER!!

OSO LE MEA E LE FAASA...AEA :-)

Mode of data collection

Mail Questionnaire [mail]

Research instrument

General instructions, authority for the survey, etc;

Business demography information on ownership, contact details, structure, etc.;

Employment;

Income;

Expenses;

Inventories;

Profit or loss and reconciliation to business accounts' profit and loss;

Fixed assets - purchases, disposals, book values

Thank you and signature of respondent.

Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.

Cleaning operations

Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.

Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.

Sampling error estimates

NOT APPLICABLE!!
d
Factori People Data | USA | Purchase, Behavior, Intent, Interest | Email,...
datarade.ai
.json, .csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori, Factori People Data | USA | Purchase, Behavior, Intent, Interest | Email, Address, Income, Insurance, Vehicle, Household | 100+ Attributes [Dataset]. https://datarade.ai/data-products/factori-consumer-graph-data-usa-purchase-behavior-inten-factori
Explore at:
.json, .csvAvailable download formats
Dataset authored and provided by
Factori
Area covered
United States
Description
Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your People data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

People Data Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation.

Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment

Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
mortgage_loan2_type mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender mortgage_loan2_lender
mortgage_loan2_ratetype mortgage_rate
mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
office_census_block_group
office_census_tract office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl company_linkedinurl company_twitterurl
company_website company_fortune_rank
company_government_type company_headquarters_branch company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual company_msa company_msa_name
company_naics_code
company_naics_description
company_naics_code2 company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4 company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description company_parent_company
company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
company_revenue company_sales_volume
company_small_business company_stock_ticker company_year_founded company_minorityowned
company_female_owned_or_operated company_franchise_code company_dma company_dma_name
company_hq_address
company_hq_city company_hq_duns company_hq_state
company_hq_zip5 company_hq_zip4 company_sect...
Taking Part 2010/11 quarter 4: Statistical release
gov.uk
Updated Aug 9, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Digital, Culture, Media & Sport (2011). Taking Part 2010/11 quarter 4: Statistical release [Dataset]. https://www.gov.uk/government/statistics/taking-part-the-national-survey-of-culture-leisure-and-sport-2010-11
Explore at:
Dataset updated
Aug 9, 2011
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Digital, Culture, Media & Sport
Description
The latest estimates from the 2010/11 Taking Part adult survey produced by DCMS were released on 30 June 2011 according to the arrangements approved by the UK Statistics Authority.

Released:

30 June 2011
**

Period covered:

April 2010 to April 2011
**

Geographic coverage:

National and Regional level data for England.
**

Next release date:

Further analysis of the 2010/11 adult dataset and data for child participation will be published on 18 August 2011.

Summary

The latest data from the 2010/11 Taking Part survey provides reliable national estimates of adult engagement with sport, libraries, the arts, heritage and museums & galleries. This release also presents analysis on volunteering and digital participation in our sectors and a look at cycling and swimming proficiency in England. The Taking Part survey is a continuous annual survey of adults and children living in private households in England, and carries the National Statistics badge, meaning that it meets the highest standards of statistical quality.

Statistical Report

http://www.culture.gov.uk/images/research/Taking_Part_Y6_Release.pdf">Taking Part: The National Survey of Culture, Leisure and Sport 2010/11 (PDF 713kb)

http://www.culture.gov.uk/images/research/Taking_Part_Y6_Release.doc">Taking Part: The National Survey of Culture, Leisure and Sport 2010/11 (Word 674kb)

Statistical Worksheets

These spreadsheets contain the data and sample sizes for each sector included in the survey:

http://www.culture.gov.uk/images/research/Y6_Q4_Figures_Arts.xls">The Arts (Excel 79kb)

http://www.culture.gov.uk/images/research/Y6_Q4_Figures_Cycling_and_swimming_proficiency.xls">Cycling and swimming proficiency (Excel 38kb)

http://www.culture.gov.uk/images/research/Y6_Q4_Figures_Digital_participation.xls">Digital Participation (Excel 51kb)

http://www.culture.gov.uk/images/research/Y6_Q4_Figures_Heritage.xls">Heritage (Excel 45kb)

http://www.culture.gov.uk/images/research/Y6_Q4_Figures_Libraries.xls">Libraries (Excel 42kb)

http://www.culture.gov.uk/images/research/Y6_Q4_Museums_and_Galleries2.xls">Museums and Galleries (Excel 43kb)

http://www.culture.gov.uk/images/research/Y6_Q4_Sports2.xls">Sport (Excel 44kb)

http://www.culture.gov.uk/images/research/Y6_Q4_Figures_Volunteering.xls">Volunteering (Excel 87kb)

Previous release

The previous Taking Part release was published on 31 March 2011 and can be found online.

http://www.culture.gov.uk/publications/7995.aspx">Taking Part: The National Survey of Culture, Leisure and Sport January - December 2010

The UK Statistics Authority

This release is published in accordance with the Code of Practice for Official Statistics (2009), as produced by the http://www.statisticsauthority.gov.uk/">UK Statistics Authority (UKSA). The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.

Pre-release access

The document below contains a list of Ministers and Officials who have received privileged early access to this release of Taking Part data. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.

http://www.culture.gov.uk/images/publications/TP_Pre-release_access_300611.pdf">Pre-release access list (PDF 7kb)

The responsible statistician for this release is Neil Wilson. For any queries please contact the Taking Part team on 020 7211 6968 or takingpart@culture.gsi.gov.uk.

Releated information

http://www.culture.gov.uk/what_we_do/research_and_statistics/7387.aspx">Taking Part Survey Questionnaires

http://www.culture.gov.uk/what_we_do/research_and_statistics/7388.aspx">Taking Part Technical Reports
o
Data from: Indigenous Peoples' Data During COVID-19: From External to...
data.opendevelopmentmekong.net
Updated Mar 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Indigenous Peoples' Data During COVID-19: From External to Internal [Dataset]. https://data.opendevelopmentmekong.net/dataset/indigenous-peoples-data-during-covid-19-from-external-to-internal
Explore at:
Dataset updated
Mar 29, 2021
Description
This paper explores the particular issues that COVID-19 has highlighted for Indigenous Peoples focusing on data for governance. Drawing on current global examples, we underscore the inclusion of Indigenous Peoples in COVID-19 activities as the basis of data-related policy recommendations to increase the use of timely, relevant data for decision-making while reducing risk and harms.
Data from: Statistical methods to control for confounders in rare disease...
tandf.figshare.com
odt
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiwei He; Di Zhang; Feng Li (2025). Statistical methods to control for confounders in rare disease settings that use external control [Dataset]. http://doi.org/10.6084/m9.figshare.25736878.v1
Explore at:
odtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25736878.v1
Dataset updated
Jun 19, 2025
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Jiwei He; Di Zhang; Feng Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the drug development for rare disease, the number of treated subjects in the clinical trial is often very small, whereas the number of external controls can be relatively large. There is no clear guidance on choosing an appropriate statistical method to control baseline confounding in this situation. To fill this gap, we conduct extensive simulations to evaluate the performance of commonly used matching and weighting methods as well as the more recently developed targeted maximum likelihood estimation (TMLE) and cardinality matching in small sample settings, mimicking the motivating data from a pediatric rare disease. Among the methods examined, the performance of coarsened exact matching (CEM) and TMLE are relatively robust under various model specifications. CEM is only feasible when the number of controls far exceeds the number of treated, whereas TMLE has better performance with less extreme treatment allocation ratios. Our simulations suggest bootstrap is useful for variance estimation in small samples after matching.
f
Table1_Licensing of Orphan Medicinal Products—Use of Real-World Data and...
figshare.com
frontiersin.figshare.com
xlsx
Updated Jun 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frauke Naumann-Winter; Franziska Wolter; Ulrike Hermes; Eva Malikova; Nils Lilienthal; Tania Meier; Maria Elisabeth Kalland; Armando Magrelli (2023). Table1_Licensing of Orphan Medicinal Products—Use of Real-World Data and Other External Data on Efficacy Aspects in Marketing Authorization Applications Concluded at the European Medicines Agency Between 2019 and 2021.XLSX [Dataset]. http://doi.org/10.3389/fphar.2022.920336.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2022.920336.s002
Dataset updated
Jun 16, 2023
Dataset provided by
Frontiers
Authors
Frauke Naumann-Winter; Franziska Wolter; Ulrike Hermes; Eva Malikova; Nils Lilienthal; Tania Meier; Maria Elisabeth Kalland; Armando Magrelli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Reference to so-called real-world data is more often made in marketing authorization applications for medicines intended to diagnose, prevent or treat rare diseases compared to more common diseases. We provide granularity on the type and aim of any external data on efficacy aspects from both real-world data sources and external trial data as discussed in regulatory submissions of orphan designated medicinal products in the EU. By quantifying the contribution of external data according to various regulatory characteristics, we aimed at identifying specific opportunities for external data in the field of orphan conditions.Methods: Information on external data in regulatory documents covering 72 orphan designations was extracted. Our sample comprised public assessment reports for approved, refused, or withdrawn applications concluded from 2019–2021 at the European Medicines Agency. Products with an active orphan designation at the time of submission were scrutinized regarding the role of external data on efficacy aspects in the context of marketing authorization applications, or on the criterion of “significant benefit” for the confirmation of the orphan designation at the time of licensing. The reports allowed a broad distinction between clinical development, regulatory decision making, and intended post-approval data collection. We defined three categories of external data, administrative data, structured clinical data, and external trial data (from clinical trials not sponsored by the applicant), and noted whether external data concerned the therapeutic context of the disease or the product under review.Results: While reference to external data with respect to efficacy aspects was included in 63% of the approved medicinal products in the field of rare diseases, 37% of marketing authorization applications were exclusively based on the dedicated clinical development plan for the product under review. Purely administrative data did not play any role in our sample of reports, but clinical data collected in a structured manner (from routine care or clinical research) were often used to inform on the trial design. Two additional recurrent themes for the use of external data were the contextualization of results, especially to confirm the orphan designation at the time of licensing, and reassurance of a large difference in treatment effect size or consistency of effects observed in clinical trials and practice. External data on the product under review were restricted to either active substances already belonging to the standard of care even before authorization or to compassionate use schemes. Furthermore, external data were considered pivotal for marketing authorization only exceptionally and only for active substances already in use within the specific therapeutic indication. Applications for the rarest conditions and those without authorized treatment alternatives were especially prominent with respect to the use of external data from real-world data sources both in the pre- and post-approval setting.Conclusion: Specific opportunities for external data in the setting of marketing authorizations in the field of rare diseases were identified. Ongoing initiatives of fostering systematic data collection are promising steps for a more efficient medicinal product development in the field of rare diseases.
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
nada-demo.ihsn.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
d
Analysis of research data for 11 insitutions - Data Monitor
elsevier.digitalcommonsdata.com
Updated May 29, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elena Zudilova-Seinstra (2020). Analysis of research data for 11 insitutions - Data Monitor [Dataset]. http://doi.org/10.17632/k5p45z33kb.2
Explore at:
Unique identifier
https://doi.org/10.17632/k5p45z33kb.2
Dataset updated
May 29, 2020
Authors
Elena Zudilova-Seinstra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We conducted an analysis to confirm our observations that only a very small percentage of public research data is hosted in the Institutional Data Repositories, while the vast majority is published in the open domain-specific and generalist data repositories.

For this analysis, we selected 11 institutions, many of which have been our evaluation partners. For each institution, we counted the number of datasets published in their Institutional Data Repository (IDR) and tracked the number of public research datasets hosted in external data repositories via the Data Monitor API. External tracking was based on the corpus of 14+ mln data records checked against the institutional SciVal ID. One institution didn’t have an IDR.

We found out that 10 out of 11 institutions had most of their public research data hosted outside of their institution, where by research data we mean not only datasets, but a broader notion that includes, for example, software.

We will be happy to expand it by adding more institutions upon request.

Note: This is version 2 of the earlier published dataset. The number of datasets published and tracked in the Monash Institutional Data Repository has been updated based on the information provided by the Monash Library. The number of datasets in the NTU Institutional Data Repository now includes datasets only. Dataverses were excluded to avoid double counting.
d
Factori | US People Data - Acquisition Marketing & People Data Insights |...
datarade.ai
.json, .csv
Updated Jul 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori (2022). Factori | US People Data - Acquisition Marketing & People Data Insights | Append 100+ Attributes from 220M+ Consumer Profiles [Dataset]. https://datarade.ai/data-products/factori-usa-consumer-graph-data-acquisition-marketing-a-factori
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jul 23, 2022
Dataset authored and provided by
Factori
Area covered
United States of America
Description
Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your People data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

People Data Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation.

Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment

Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity.

Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
mortgage_loan2_type mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender mortgage_loan2_lender
mortgage_loan2_ratetype mortgage_rate
mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
office_census_block_group
office_census_tract office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl company_linkedinurl company_twitterurl
company_website company_fortune_rank
company_government_type company_headquarters_branch company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual company_msa company_msa_name
company_naics_code
company_naics_description
company_naics_code2 company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4 company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description company_parent_company
company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
company_revenue company_sales_volume
company_small_business company_stock_ticker company_year_founded company_minorityowned
company_female_owned_or_operated company_franchise_code company_dma company_dma_name
company_hq_address
company_hq_city company_hq_duns company_hq_state
company_hq_zip5 company_hq_zip4 company_sec...
d
Factori AI & ML Training Data | People Data | USA | Machine Learning Data
datarade.ai
.json, .csv
Updated Jul 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori (2022). Factori AI & ML Training Data | People Data | USA | Machine Learning Data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-consumer-data-usa-machine-factori
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jul 23, 2022
Dataset authored and provided by
Factori
Area covered
United States of America
Description
Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

Geography - City, State, ZIP, County, CBSA, Census Tract, etc.

Demographics - Gender, Age Group, Marital Status, Language etc.

Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc

Persona - Consumer type, Communication preferences, Family type, etc

Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.

Household - Number of Children, Number of Adults, IP Address, etc.

Behaviours - Brand Affinity, App Usage, Web Browsing etc.

Firmographics - Industry, Company, Occupation, Revenue, etc

Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.

Auto - Car Make, Model, Type, Year, etc.

Housing - Home type, Home value, Renter/Owner, Year Built etc.

People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

People data Use Cases:

360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
mortgage_loan2_type mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender mortgage_loan2_lender
mortgage_loan2_ratetype mortgage_rate
mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
office_census_block_group
office_census_tract office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl company_linkedinurl company_twitterurl
company_website company_fortune_rank
company_government_type company_headquarters_branch company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual company_msa company_msa_name
company_naics_code
company_naics_description
company_naics_code2 company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4 company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description company_parent_company
company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
company_revenue company_sales_volume
company_small_business company_stock_ticker company_year_founded company_minorityowned
company_female_owned_or_operated company_franchise_code company_dma company_dma_name
company_hq_address
company_hq_city company_hq_duns company_hq_state
company_hq_zip5 company_hq_zip4 company_se...
d
Data Management Plan Examples Database
search.dataone.org
borealisdata.ca
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/SDITUG
Dataset updated
Sep 4, 2024
Dataset provided by
Borealis
Authors
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak
Time period covered
Jan 1, 2011 - Jan 1, 2023
Description
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
2
2021 Census: Safeguarded Household Microdata Sample (England and Wales)
beta.ukdataservice.ac.uk
datacatalogue.ukdataservice.ac.uk
Updated Dec 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). 2021 Census: Safeguarded Household Microdata Sample (England and Wales) [Dataset]. http://doi.org/10.5255/UKDA-SN-9156-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-9156-1
Dataset updated
Dec 17, 2024
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Office for National Statistics
Time period covered
Mar 21, 2021
Area covered
Wales, England
Description
The 2021 UK Census was the 23rd official census of the United Kingdom. The UK Census is generally conducted once every 10 years, and the 2021 censuses of England, Wales, and Northern Ireland took place on 21 March 2021. In Scotland, the decision was made to move the census to March 2022 because of the impact of the coronavirus pandemic (see SNs 9461 and 9462). The censuses were administered by the Office for National Statistics (ONS), the Northern Ireland Statistics and Research Agency (NISRA) and National Records of Scotland (NRS), respectively.
Census 2021 was the first census with a digital-first design, encouraging participants to respond online rather than on a paper questionnaire. Support was given to people who could not respond online, including paper questionnaires, telephone contact centres, field force support, and an extended collection period.
Topics covered in the 2021 UK Census included:
demography and migration
ethnic group, national identity, language and religion
labour market and travel to work
housing
education
health, disability, and unpaid care
Welsh and other languages
UK armed forces veterans
sexual orientation and gender identity.

The 2021 Census: Safeguarded Household Microdata Sample dataset consists of a random sample of 1% of households from the 2021 Census and contains records for all individuals within these sampled households. It includes records for 263,729 households and 606,210 persons. These data cover England and Wales only. This sample allows linkage between individuals in the same household.  The lowest level of geography is Wales and regions within England. It contains 56 variables and a low level of detail. This is a new ONS product following user feedback from the 2011 Census.
Census Microdata
Microdata are small samples of individual records from a single census from which identifying information have been removed. They contain a range of individual and household characteristics and can be used to carry out analysis not possible from standard census outputs, such as:
creating tables using bespoke variable combinations
investigating specific combinations of variables or categories in a high level of detail
conducting non-tabular statistical analyses on record-level data.
The microdata samples are designed to protect the confidentiality of individuals and households. This is done by applying access controls and removing information that might directly identify a person, such as names, addresses and date of birth. Record swapping is applied to the census data used to create the microdata samples. This is a statistical disclosure control (SDC) method, which makes very small changes to the data to prevent the identification of individuals. The microdata samples use further SDC methods, such as collapsing variables and restricting detail. The samples also include records that have been edited to prevent inconsistent data and contain imputed persons, households, and data values. To protect confidentiality, imputation flags are not included in any 2021 Census microdata sample.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Frequently leveraged external data sources for global enterprises 2020 [Dataset]. https://www.statista.com/statistics/1235514/worldwide-popular-external-data-sources-companies/

Frequently leveraged external data sources for global enterprises 2020

Explore at:

Dataset updated

Jul 1, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Aug 2020

Area covered

Worldwide

Description

In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.

Clear search

Close search

Google apps

Main menu

Frequently leveraged external data sources for global enterprises 2020

sEH External Data

sEHの外部データに以下の処理を実施したDataset

Data Example

DfE external data shares

Archive

PII | External Dataset

Versions

Description

Comparisons between the recruited sample and external data.

Data from: Big Data versus a Survey

Data from: Augmenting the Control Arm of Randomized Trials by Incorporating...

The Lick (External Examples, Non-strict)

Dataset

Contents

Business Activity Survey 2009 - Samoa

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

Factori People Data | USA | Purchase, Behavior, Intent, Interest | Email,...

Taking Part 2010/11 quarter 4: Statistical release

Released:

Period covered:

Geographic coverage:

Next release date:

Summary

Statistical Report

Statistical Worksheets

Previous release

The UK Statistics Authority

Pre-release access

Releated information

Data from: Indigenous Peoples' Data During COVID-19: From External to...

Data from: Statistical methods to control for confounders in rare disease...

Table1_Licensing of Orphan Medicinal Products—Use of Real-World Data and...

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Analysis of research data for 11 insitutions - Data Monitor

Factori | US People Data - Acquisition Marketing & People Data Insights |...

Factori AI & ML Training Data | People Data | USA | Machine Learning Data

Data Management Plan Examples Database

2021 Census: Safeguarded Household Microdata Sample (England and Wales)

Frequently leveraged external data sources for global enterprises 2020