100+ datasets found
  1. Frequently leveraged external data sources for global enterprises 2020

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Frequently leveraged external data sources for global enterprises 2020 [Dataset]. https://www.statista.com/statistics/1235514/worldwide-popular-external-data-sources-companies/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Aug 2020
    Area covered
    Worldwide
    Description

    In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.

  2. sEH External Data

    • kaggle.com
    zip
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kyu999 (2024). sEH External Data [Dataset]. https://www.kaggle.com/datasets/kyu999/seh-external-data
    Explore at:
    zip(57251119 bytes)Available download formats
    Dataset updated
    Jun 7, 2024
    Authors
    kyu999
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    sEHの外部データに以下の処理を実施したDataset

    • DNA linkerをDyに置換
    • bindsカラム追加
    • is_triazineカラム追加
    • 各buildingblockをidへ変換
    • molecule_smilesをcanonical formに変換(これは念の為)

    Data Example

    read_countmolecule_smilesbindsis_triazinebuildingblock1_idbuildingblock2_idbuildingblock3_id
    0"O=C(CCn1ccc(C(=O)N2CCN(c3ncncn…0true350138443655
    0"CC(C)C[C@H](NC(=O)CCn1ccc(C(=O…0true593338443655
    0"CN1CCC(CN([Dy])C(=O)CCn2ccc(C(…0true423238443655
    0"Cc1c(F)ccc(CN([Dy])C(=O)CCn2cc…0true268938443655
    0"O=C(CCn1ccc(C(=O)N2CCN(c3ncncn…0true490938443655
  3. DfE external data shares

    • gov.uk
    • s3.amazonaws.com
    Updated Sep 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2025). DfE external data shares [Dataset]. https://www.gov.uk/government/publications/dfe-external-data-shares
    Explore at:
    Dataset updated
    Sep 11, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Education
    Description

    ‘DfE external data shares’ includes:

    • classification of data – understanding the data we share
    • DfE external third-party data shares (approved by the Data Sharing Approvals Panel from May 2018 to 30 June 2025)
    • regular DfE external third-party data shares supported by appropriate data sharing agreements as of June 2025
    • third-party requests for data from the national pupil database (approved prior to April 2018 where data is still held by the requestor as at 30 June 2025)
    • data shares with Home Office
    • data shares with police and criminal investigation authorities
    • data shares through court orders

    DfE also provides external access to data under https://www.legislation.gov.uk/ukpga/2017/30/section/64/enacted">Section 64, Chapter 5, of the Digital Economy Act 2017. Details of these data shares can be found in the https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research-statistics/better-useofdata-for-research-information-for-researchers/list-of-accredited-researchers-and-research-projects-under-the-research-strand-of-the-digital-economy-act/">UK Statistics Authority list of accredited projects.

    Archive

    Previous external data shares can be viewed in the https://webarchive.nationalarchives.gov.uk/ukgwa/timeline1/https://www.gov.uk/government/publications/dfe-external-data-shares">National Archives.

    The data in the archived documents may not match DfE’s internal data request records due to definitions or business rules changing following process improvements.

  4. PII | External Dataset

    • kaggle.com
    zip
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    moth (2024). PII | External Dataset [Dataset]. https://www.kaggle.com/datasets/alejopaullier/pii-external-dataset
    Explore at:
    zip(7923518 bytes)Available download formats
    Dataset updated
    Jan 24, 2024
    Authors
    moth
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is an LLM-generated external dataset for the: - The Learning Agency Lab - PII Data Detection Competition

    Versions

    • v2: Added +1000 texts with new PII information like URLs and usernames. Also, now the dataset includes the PII information as columns. Note that not all the PII information is included on the text on purpose.

    Description

    It contains 3382 4434 generated texts with their corresponding annotated labels in the required competition format.

    Description: - document (str): ID of the essay - full_text (string): AI generated text. - tokens (string): a list with the tokens (comes from text.split()) - trailing_whitespace (list): a list with boolean values indicating whether each token is followed by whitespace. - labels (list): list with token labels in BIO format

  5. f

    Comparisons between the recruited sample and external data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jul 9, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jones, Ruth; Watkins, Alan; Allen, Steven J.; Storey, Mel; Morgan, Gareth; Russell, Ian T.; Thornton, Catherine A.; Brooks, Caroline J.; Plummer, Sue F.; Heaven, Martin L.; Jordan, Sue; Garaiova, Iveta (2013). Comparisons between the recruited sample and external data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001716768
    Explore at:
    Dataset updated
    Jul 9, 2013
    Authors
    Jones, Ruth; Watkins, Alan; Allen, Steven J.; Storey, Mel; Morgan, Gareth; Russell, Ian T.; Thornton, Catherine A.; Brooks, Caroline J.; Plummer, Sue F.; Heaven, Martin L.; Jordan, Sue; Garaiova, Iveta
    Description

    Notes to table: No correction taken for multiple comparisons.1Deprivation (Townsend) scores, ranks and fifths are based on geographical area of residence, using Lower Super Output Areas (LSOAs) defined by postcodes. This measure of material deprivation is calculated from rates of unemployment, vehicle ownership, home ownership, and overcrowding [49].2In five cases, both parents were students, and ONS categories could not be allocated. Fathers’ occupations taken where no occupation for mother [44], [49].3as reported by mothers at recruitment at 36 weeks’ pregnancy.4as in hospital records.5unequal sample sizes, unequal variances.

  6. Data from: Big Data versus a Survey

    • clevelandfed.org
    Updated Dec 31, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Reserve Bank of Cleveland (2014). Big Data versus a Survey [Dataset]. https://www.clevelandfed.org/publications/working-paper/2014/wp-1440-big-data-versus-a-survey
    Explore at:
    Dataset updated
    Dec 31, 2014
    Dataset authored and provided by
    Federal Reserve Bank of Clevelandhttps://www.clevelandfed.org/
    Description

    Economists are shifting attention and resources from work on survey data towork on “big data.” This analysis is an empirical exploration of the trade-offs this transition requires. Parallel models are estimated using the Federal Reserve Bank of New York Consumer Credit Panel/Equifax and the Survey of Consumer Finances. After adjustments to account for different variable definitions and sampled populations, it is possible to arrive at similar models of total household debt. However, the estimates are sensitive to the adjustments. Little similarity is observed in parallel models of nonmortgage debt. While surveys intentionally collect theoretically related variables, it may be necessary to merge external data into commercial big data. In this example, some education and income measures are successfully integrated with the big data, but other external aggregates fail to adequately substitute for survey responses. Big data offers sample sizes, frequencies, and details that surveys cannot match. However, this example illustrates why caution is appropriate when attempting to substitute big data for a carefully executed survey.

  7. f

    Data from: Augmenting the Control Arm of Randomized Trials by Incorporating...

    • tandf.figshare.com
    bin
    Updated Oct 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xun Xu; Ying Yuan; J. Jack Lee (2025). Augmenting the Control Arm of Randomized Trials by Incorporating Multiple External Data Sources Using Propensity Score Stratification and Data-Driven Mixture Prior [Dataset]. http://doi.org/10.6084/m9.figshare.29951984.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 10, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Xun Xu; Ying Yuan; J. Jack Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To enhance efficiency in drug development, interest in augmenting randomized controlled trials by supplementing the control arm with external data has grown rapidly. However, external data may lack between-population exchangeability. To facilitate proper information borrowing, we propose two two-stage strategies: the stratified propensity score self-adaptive mixture (SPS-SAM) prior and stratified propensity score calibrated elastic mixture (SPS-CEM) prior. The mixture prior is composed of an informative meta-analytic predictive (MAP) prior and a vague prior. In the first stage, propensity scores (PS) stratification is performed to select similar subjects from external data. Within each stratum, to mitigate the measured confounding, we calculate the PS overlap coefficient to account for the between-group heterogeneity by adjusting the hyperparameters of the MAP prior. In the second stage, to reduce unmeasured confounding and address potential prior-data conflict, we construct a data-driven mixture prior incorporating an adaptive weight that dynamically controls the proportion of the MAP prior. To obtain the adaptive weight measuring the extent of congruence between the current and the external data, SPS-SAM prior uses the likelihood ratio test and SPS-CEM prior uses the scaled t-test, respectively. Compared with existing methods, simulations studies and illustrative examples demonstrate the superior features of the proposed methods. Both proposed methods outperform existing methods by yielding smaller bias, greater calibrated power, and achieving accurate, efficient, and robust estimation of the treatment effect.

  8. The Lick (External Examples, Non-strict)

    • kaggle.com
    zip
    Updated Jan 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Chamberlain (2022). The Lick (External Examples, Non-strict) [Dataset]. https://www.kaggle.com/datasets/andychamberlain/the-lick-external-examples-nonstrict
    Explore at:
    zip(81833193 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Andy Chamberlain
    Description

    Dataset

    This dataset was created by Andy Chamberlain

    Contents

  9. p

    Business Activity Survey 2009 - Samoa

    • microdata.pacificdata.org
    Updated Jul 2, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samoa Bureau of Statistics (2019). Business Activity Survey 2009 - Samoa [Dataset]. https://microdata.pacificdata.org/index.php/catalog/253
    Explore at:
    Dataset updated
    Jul 2, 2019
    Dataset authored and provided by
    Samoa Bureau of Statistics
    Time period covered
    2009
    Area covered
    Samoa
    Description

    Abstract

    The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).

    The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.

    The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.

    Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).

    A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.

    Geographic coverage

    National Coverage

    Analysis unit

    The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.

    SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.

    It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.

    Universe

    The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    -Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)

    WILL CONFIRM LATER!!

    OSO LE MEA E LE FAASA...AEA :-)

    Mode of data collection

    Mail Questionnaire [mail]

    Research instrument

    1. General instructions, authority for the survey, etc;
    2. Business demography information on ownership, contact details, structure, etc.;
    3. Employment;
    4. Income;
    5. Expenses;
    6. Inventories;
    7. Profit or loss and reconciliation to business accounts' profit and loss;
    8. Fixed assets - purchases, disposals, book values
    9. Thank you and signature of respondent.

    Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.

    Cleaning operations

    Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.

    Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.

    Sampling error estimates

    NOT APPLICABLE!!

  10. d

    Factori People Data | USA | Purchase, Behavior, Intent, Interest | Email,...

    • datarade.ai
    .json, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori, Factori People Data | USA | Purchase, Behavior, Intent, Interest | Email, Address, Income, Insurance, Vehicle, Household | 100+ Attributes [Dataset]. https://datarade.ai/data-products/factori-consumer-graph-data-usa-purchase-behavior-inten-factori
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    Factori
    Area covered
    United States
    Description

    Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

    Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your People data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

    People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

    People Data Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation.

    Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment

    Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

    Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
    credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
    mortgage_loan2_type mortgage_lender_code
    mortgage_loan2_render_code
    mortgage_lender mortgage_loan2_lender
    mortgage_loan2_ratetype mortgage_rate
    mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
    office_census_block_group
    office_census_tract office_county_code
    company_phone
    company_credit_score
    company_csa_code
    company_dpbc
    company_franchiseflag
    company_facebookurl company_linkedinurl company_twitterurl
    company_website company_fortune_rank
    company_government_type company_headquarters_branch company_home_business
    company_industry
    company_num_pcs_used
    company_num_employees
    company_firm_individual company_msa company_msa_name
    company_naics_code
    company_naics_description
    company_naics_code2 company_naics_description2
    company_sic_code2
    company_sic_code2_description
    company_sic_code4 company_sic_code4_description
    company_sic_code6
    company_sic_code6_description
    company_sic_code8
    company_sic_code8_description company_parent_company
    company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
    company_revenue company_sales_volume
    company_small_business company_stock_ticker company_year_founded company_minorityowned
    company_female_owned_or_operated company_franchise_code company_dma company_dma_name
    company_hq_address
    company_hq_city company_hq_duns company_hq_state
    company_hq_zip5 company_hq_zip4 company_sect...

  11. Taking Part 2010/11 quarter 4: Statistical release

    • gov.uk
    Updated Aug 9, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Digital, Culture, Media & Sport (2011). Taking Part 2010/11 quarter 4: Statistical release [Dataset]. https://www.gov.uk/government/statistics/taking-part-the-national-survey-of-culture-leisure-and-sport-2010-11
    Explore at:
    Dataset updated
    Aug 9, 2011
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Digital, Culture, Media & Sport
    Description

    The latest estimates from the 2010/11 Taking Part adult survey produced by DCMS were released on 30 June 2011 according to the arrangements approved by the UK Statistics Authority.

    Released:

    30 June 2011
    **

    Period covered:

    April 2010 to April 2011
    **

    Geographic coverage:

    National and Regional level data for England.
    **

    Next release date:

    Further analysis of the 2010/11 adult dataset and data for child participation will be published on 18 August 2011.

    Summary

    The latest data from the 2010/11 Taking Part survey provides reliable national estimates of adult engagement with sport, libraries, the arts, heritage and museums & galleries. This release also presents analysis on volunteering and digital participation in our sectors and a look at cycling and swimming proficiency in England. The Taking Part survey is a continuous annual survey of adults and children living in private households in England, and carries the National Statistics badge, meaning that it meets the highest standards of statistical quality.

    Statistical Report

    Statistical Worksheets

    These spreadsheets contain the data and sample sizes for each sector included in the survey:

    Previous release

    The previous Taking Part release was published on 31 March 2011 and can be found online.

    The UK Statistics Authority

    This release is published in accordance with the Code of Practice for Official Statistics (2009), as produced by the http://www.statisticsauthority.gov.uk/">UK Statistics Authority (UKSA). The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.

    Pre-release access

    The document below contains a list of Ministers and Officials who have received privileged early access to this release of Taking Part data. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.

    The responsible statistician for this release is Neil Wilson. For any queries please contact the Taking Part team on 020 7211 6968 or takingpart@culture.gsi.gov.uk.

    Releated information

  12. o

    Data from: Indigenous Peoples' Data During COVID-19: From External to...

    • data.opendevelopmentmekong.net
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Indigenous Peoples' Data During COVID-19: From External to Internal [Dataset]. https://data.opendevelopmentmekong.net/dataset/indigenous-peoples-data-during-covid-19-from-external-to-internal
    Explore at:
    Dataset updated
    Mar 29, 2021
    Description

    This paper explores the particular issues that COVID-19 has highlighted for Indigenous Peoples focusing on data for governance. Drawing on current global examples, we underscore the inclusion of Indigenous Peoples in COVID-19 activities as the basis of data-related policy recommendations to increase the use of timely, relevant data for decision-making while reducing risk and harms.

  13. Data from: Statistical methods to control for confounders in rare disease...

    • tandf.figshare.com
    odt
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiwei He; Di Zhang; Feng Li (2025). Statistical methods to control for confounders in rare disease settings that use external control [Dataset]. http://doi.org/10.6084/m9.figshare.25736878.v1
    Explore at:
    odtAvailable download formats
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Jiwei He; Di Zhang; Feng Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the drug development for rare disease, the number of treated subjects in the clinical trial is often very small, whereas the number of external controls can be relatively large. There is no clear guidance on choosing an appropriate statistical method to control baseline confounding in this situation. To fill this gap, we conduct extensive simulations to evaluate the performance of commonly used matching and weighting methods as well as the more recently developed targeted maximum likelihood estimation (TMLE) and cardinality matching in small sample settings, mimicking the motivating data from a pediatric rare disease. Among the methods examined, the performance of coarsened exact matching (CEM) and TMLE are relatively robust under various model specifications. CEM is only feasible when the number of controls far exceeds the number of treated, whereas TMLE has better performance with less extreme treatment allocation ratios. Our simulations suggest bootstrap is useful for variance estimation in small samples after matching.

  14. f

    Table1_Licensing of Orphan Medicinal Products—Use of Real-World Data and...

    • figshare.com
    • frontiersin.figshare.com
    xlsx
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frauke Naumann-Winter; Franziska Wolter; Ulrike Hermes; Eva Malikova; Nils Lilienthal; Tania Meier; Maria Elisabeth Kalland; Armando Magrelli (2023). Table1_Licensing of Orphan Medicinal Products—Use of Real-World Data and Other External Data on Efficacy Aspects in Marketing Authorization Applications Concluded at the European Medicines Agency Between 2019 and 2021.XLSX [Dataset]. http://doi.org/10.3389/fphar.2022.920336.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Frontiers
    Authors
    Frauke Naumann-Winter; Franziska Wolter; Ulrike Hermes; Eva Malikova; Nils Lilienthal; Tania Meier; Maria Elisabeth Kalland; Armando Magrelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Reference to so-called real-world data is more often made in marketing authorization applications for medicines intended to diagnose, prevent or treat rare diseases compared to more common diseases. We provide granularity on the type and aim of any external data on efficacy aspects from both real-world data sources and external trial data as discussed in regulatory submissions of orphan designated medicinal products in the EU. By quantifying the contribution of external data according to various regulatory characteristics, we aimed at identifying specific opportunities for external data in the field of orphan conditions.Methods: Information on external data in regulatory documents covering 72 orphan designations was extracted. Our sample comprised public assessment reports for approved, refused, or withdrawn applications concluded from 2019–2021 at the European Medicines Agency. Products with an active orphan designation at the time of submission were scrutinized regarding the role of external data on efficacy aspects in the context of marketing authorization applications, or on the criterion of “significant benefit” for the confirmation of the orphan designation at the time of licensing. The reports allowed a broad distinction between clinical development, regulatory decision making, and intended post-approval data collection. We defined three categories of external data, administrative data, structured clinical data, and external trial data (from clinical trials not sponsored by the applicant), and noted whether external data concerned the therapeutic context of the disease or the product under review.Results: While reference to external data with respect to efficacy aspects was included in 63% of the approved medicinal products in the field of rare diseases, 37% of marketing authorization applications were exclusively based on the dedicated clinical development plan for the product under review. Purely administrative data did not play any role in our sample of reports, but clinical data collected in a structured manner (from routine care or clinical research) were often used to inform on the trial design. Two additional recurrent themes for the use of external data were the contextualization of results, especially to confirm the orphan designation at the time of licensing, and reassurance of a large difference in treatment effect size or consistency of effects observed in clinical trials and practice. External data on the product under review were restricted to either active substances already belonging to the standard of care even before authorization or to compassionate use schemes. Furthermore, external data were considered pivotal for marketing authorization only exceptionally and only for active substances already in use within the specific therapeutic indication. Applications for the rarest conditions and those without authorized treatment alternatives were especially prominent with respect to the use of external data from real-world data sources both in the pre- and post-approval setting.Conclusion: Specific opportunities for external data in the setting of marketing authorizations in the field of rare diseases were identified. Ongoing initiatives of fostering systematic data collection are promising steps for a more efficient medicinal product development in the field of rare diseases.

  15. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  16. d

    Analysis of research data for 11 insitutions - Data Monitor

    • elsevier.digitalcommonsdata.com
    Updated May 29, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Zudilova-Seinstra (2020). Analysis of research data for 11 insitutions - Data Monitor [Dataset]. http://doi.org/10.17632/k5p45z33kb.2
    Explore at:
    Dataset updated
    May 29, 2020
    Authors
    Elena Zudilova-Seinstra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We conducted an analysis to confirm our observations that only a very small percentage of public research data is hosted in the Institutional Data Repositories, while the vast majority is published in the open domain-specific and generalist data repositories.

    For this analysis, we selected 11 institutions, many of which have been our evaluation partners. For each institution, we counted the number of datasets published in their Institutional Data Repository (IDR) and tracked the number of public research datasets hosted in external data repositories via the Data Monitor API. External tracking was based on the corpus of 14+ mln data records checked against the institutional SciVal ID. One institution didn’t have an IDR.

    We found out that 10 out of 11 institutions had most of their public research data hosted outside of their institution, where by research data we mean not only datasets, but a broader notion that includes, for example, software.

    We will be happy to expand it by adding more institutions upon request.

    Note: This is version 2 of the earlier published dataset. The number of datasets published and tracked in the Monash Institutional Data Repository has been updated based on the information provided by the Monash Library. The number of datasets in the NTU Institutional Data Repository now includes datasets only. Dataverses were excluded to avoid double counting.

  17. d

    Factori | US People Data - Acquisition Marketing & People Data Insights |...

    • datarade.ai
    .json, .csv
    Updated Jul 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori (2022). Factori | US People Data - Acquisition Marketing & People Data Insights | Append 100+ Attributes from 220M+ Consumer Profiles [Dataset]. https://datarade.ai/data-products/factori-usa-consumer-graph-data-acquisition-marketing-a-factori
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jul 23, 2022
    Dataset authored and provided by
    Factori
    Area covered
    United States of America
    Description

    Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

    Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your People data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

    People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

    People Data Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation.

    Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment

    Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity.

    Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

    Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
    credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
    mortgage_loan2_type mortgage_lender_code
    mortgage_loan2_render_code
    mortgage_lender mortgage_loan2_lender
    mortgage_loan2_ratetype mortgage_rate
    mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
    office_census_block_group
    office_census_tract office_county_code
    company_phone
    company_credit_score
    company_csa_code
    company_dpbc
    company_franchiseflag
    company_facebookurl company_linkedinurl company_twitterurl
    company_website company_fortune_rank
    company_government_type company_headquarters_branch company_home_business
    company_industry
    company_num_pcs_used
    company_num_employees
    company_firm_individual company_msa company_msa_name
    company_naics_code
    company_naics_description
    company_naics_code2 company_naics_description2
    company_sic_code2
    company_sic_code2_description
    company_sic_code4 company_sic_code4_description
    company_sic_code6
    company_sic_code6_description
    company_sic_code8
    company_sic_code8_description company_parent_company
    company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
    company_revenue company_sales_volume
    company_small_business company_stock_ticker company_year_founded company_minorityowned
    company_female_owned_or_operated company_franchise_code company_dma company_dma_name
    company_hq_address
    company_hq_city company_hq_duns company_hq_state
    company_hq_zip5 company_hq_zip4 company_sec...

  18. d

    Factori AI & ML Training Data | People Data | USA | Machine Learning Data

    • datarade.ai
    .json, .csv
    Updated Jul 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori (2022). Factori AI & ML Training Data | People Data | USA | Machine Learning Data [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-consumer-data-usa-machine-factori
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jul 23, 2022
    Dataset authored and provided by
    Factori
    Area covered
    United States of America
    Description

    Our People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

    Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

    1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc.
    2. Demographics - Gender, Age Group, Marital Status, Language etc.
    3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc
    4. Persona - Consumer type, Communication preferences, Family type, etc
    5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.
    6. Household - Number of Children, Number of Adults, IP Address, etc.
    7. Behaviours - Brand Affinity, App Usage, Web Browsing etc.
    8. Firmographics - Industry, Company, Occupation, Revenue, etc
    9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.
    10. Auto - Car Make, Model, Type, Year, etc.
    11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

    People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

    People data Use Cases:

    360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

    Here's the schema of People Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
    credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
    mortgage_loan2_type mortgage_lender_code
    mortgage_loan2_render_code
    mortgage_lender mortgage_loan2_lender
    mortgage_loan2_ratetype mortgage_rate
    mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
    office_census_block_group
    office_census_tract office_county_code
    company_phone
    company_credit_score
    company_csa_code
    company_dpbc
    company_franchiseflag
    company_facebookurl company_linkedinurl company_twitterurl
    company_website company_fortune_rank
    company_government_type company_headquarters_branch company_home_business
    company_industry
    company_num_pcs_used
    company_num_employees
    company_firm_individual company_msa company_msa_name
    company_naics_code
    company_naics_description
    company_naics_code2 company_naics_description2
    company_sic_code2
    company_sic_code2_description
    company_sic_code4 company_sic_code4_description
    company_sic_code6
    company_sic_code6_description
    company_sic_code8
    company_sic_code8_description company_parent_company
    company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
    company_revenue company_sales_volume
    company_small_business company_stock_ticker company_year_founded company_minorityowned
    company_female_owned_or_operated company_franchise_code company_dma company_dma_name
    company_hq_address
    company_hq_city company_hq_duns company_hq_state
    company_hq_zip5 company_hq_zip4 company_se...

  19. d

    Data Management Plan Examples Database

    • search.dataone.org
    • borealisdata.ca
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
    Explore at:
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Borealis
    Authors
    Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak
    Time period covered
    Jan 1, 2011 - Jan 1, 2023
    Description

    This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.

  20. 2

    2021 Census: Safeguarded Household Microdata Sample (England and Wales)

    • beta.ukdataservice.ac.uk
    • datacatalogue.ukdataservice.ac.uk
    Updated Dec 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2024). 2021 Census: Safeguarded Household Microdata Sample (England and Wales) [Dataset]. http://doi.org/10.5255/UKDA-SN-9156-1
    Explore at:
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics
    Time period covered
    Mar 21, 2021
    Area covered
    Wales, England
    Description
    The 2021 UK Census was the 23rd official census of the United Kingdom. The UK Census is generally conducted once every 10 years, and the 2021 censuses of England, Wales, and Northern Ireland took place on 21 March 2021. In Scotland, the decision was made to move the census to March 2022 because of the impact of the coronavirus pandemic (see SNs 9461 and 9462). The censuses were administered by the Office for National Statistics (ONS), the Northern Ireland Statistics and Research Agency (NISRA) and National Records of Scotland (NRS), respectively.

    Census 2021 was the first census with a digital-first design, encouraging participants to respond online rather than on a paper questionnaire. Support was given to people who could not respond online, including paper questionnaires, telephone contact centres, field force support, and an extended collection period.

    Topics covered in the 2021 UK Census included:

    • demography and migration
    • ethnic group, national identity, language and religion
    • labour market and travel to work
    • housing
    • education
    • health, disability, and unpaid care
    • Welsh and other languages
    • UK armed forces veterans
    • sexual orientation and gender identity.

    The 2021 Census: Safeguarded Household Microdata Sample dataset consists of a random sample of 1% of households from the 2021 Census and contains records for all individuals within these sampled households. It includes records for 263,729 households and 606,210 persons. These data cover England and Wales only. This sample allows linkage between individuals in the same household.  The lowest level of geography is Wales and regions within England. It contains 56 variables and a low level of detail. This is a new ONS product following user feedback from the 2011 Census.

    Census Microdata

    Microdata are small samples of individual records from a single census from which identifying information have been removed. They contain a range of individual and household characteristics and can be used to carry out analysis not possible from standard census outputs, such as:

    • creating tables using bespoke variable combinations
    • investigating specific combinations of variables or categories in a high level of detail
    • conducting non-tabular statistical analyses on record-level data.

    The microdata samples are designed to protect the confidentiality of individuals and households. This is done by applying access controls and removing information that might directly identify a person, such as names, addresses and date of birth. Record swapping is applied to the census data used to create the microdata samples. This is a statistical disclosure control (SDC) method, which makes very small changes to the data to prevent the identification of individuals. The microdata samples use further SDC methods, such as collapsing variables and restricting detail. The samples also include records that have been edited to prevent inconsistent data and contain imputed persons, households, and data values. To protect confidentiality, imputation flags are not included in any 2021 Census microdata sample.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Frequently leveraged external data sources for global enterprises 2020 [Dataset]. https://www.statista.com/statistics/1235514/worldwide-popular-external-data-sources-companies/
Organization logo

Frequently leveraged external data sources for global enterprises 2020

Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Aug 2020
Area covered
Worldwide
Description

In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.

Search
Clear search
Close search
Google apps
Main menu