Facebook
TwitterIn 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
| read_count | molecule_smiles | binds | is_triazine | buildingblock1_id | buildingblock2_id | buildingblock3_id |
|---|---|---|---|---|---|---|
| 0 | "O=C(CCn1ccc(C(=O)N2CCN(c3ncncn… | 0 | true | 3501 | 3844 | 3655 |
| 0 | "CC(C)C[C@H](NC(=O)CCn1ccc(C(=O… | 0 | true | 5933 | 3844 | 3655 |
| 0 | "CN1CCC(CN([Dy])C(=O)CCn2ccc(C(… | 0 | true | 4232 | 3844 | 3655 |
| 0 | "Cc1c(F)ccc(CN([Dy])C(=O)CCn2cc… | 0 | true | 2689 | 3844 | 3655 |
| 0 | "O=C(CCn1ccc(C(=O)N2CCN(c3ncncn… | 0 | true | 4909 | 3844 | 3655 |
Facebook
Twitter‘DfE external data shares’ includes:
DfE also provides external access to data under https://www.legislation.gov.uk/ukpga/2017/30/section/64/enacted">Section 64, Chapter 5, of the Digital Economy Act 2017. Details of these data shares can be found in the https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research-statistics/better-useofdata-for-research-information-for-researchers/list-of-accredited-researchers-and-research-projects-under-the-research-strand-of-the-digital-economy-act/">UK Statistics Authority list of accredited projects.
Previous external data shares can be viewed in the https://webarchive.nationalarchives.gov.uk/ukgwa/timeline1/https://www.gov.uk/government/publications/dfe-external-data-shares">National Archives.
The data in the archived documents may not match DfE’s internal data request records due to definitions or business rules changing following process improvements.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is an LLM-generated external dataset for the: - The Learning Agency Lab - PII Data Detection Competition
It contains 3382 4434 generated texts with their corresponding annotated labels in the required competition format.
Description:
- document (str): ID of the essay
- full_text (string): AI generated text.
- tokens (string): a list with the tokens (comes from text.split())
- trailing_whitespace (list): a list with boolean values indicating whether each token is followed by whitespace.
- labels (list): list with token labels in BIO format
Facebook
TwitterNotes to table: No correction taken for multiple comparisons.1Deprivation (Townsend) scores, ranks and fifths are based on geographical area of residence, using Lower Super Output Areas (LSOAs) defined by postcodes. This measure of material deprivation is calculated from rates of unemployment, vehicle ownership, home ownership, and overcrowding [49].2In five cases, both parents were students, and ONS categories could not be allocated. Fathers’ occupations taken where no occupation for mother [44], [49].3as reported by mothers at recruitment at 36 weeks’ pregnancy.4as in hospital records.5unequal sample sizes, unequal variances.
Facebook
TwitterEconomists are shifting attention and resources from work on survey data towork on “big data.” This analysis is an empirical exploration of the trade-offs this transition requires. Parallel models are estimated using the Federal Reserve Bank of New York Consumer Credit Panel/Equifax and the Survey of Consumer Finances. After adjustments to account for different variable definitions and sampled populations, it is possible to arrive at similar models of total household debt. However, the estimates are sensitive to the adjustments. Little similarity is observed in parallel models of nonmortgage debt. While surveys intentionally collect theoretically related variables, it may be necessary to merge external data into commercial big data. In this example, some education and income measures are successfully integrated with the big data, but other external aggregates fail to adequately substitute for survey responses. Big data offers sample sizes, frequencies, and details that surveys cannot match. However, this example illustrates why caution is appropriate when attempting to substitute big data for a carefully executed survey.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To enhance efficiency in drug development, interest in augmenting randomized controlled trials by supplementing the control arm with external data has grown rapidly. However, external data may lack between-population exchangeability. To facilitate proper information borrowing, we propose two two-stage strategies: the stratified propensity score self-adaptive mixture (SPS-SAM) prior and stratified propensity score calibrated elastic mixture (SPS-CEM) prior. The mixture prior is composed of an informative meta-analytic predictive (MAP) prior and a vague prior. In the first stage, propensity scores (PS) stratification is performed to select similar subjects from external data. Within each stratum, to mitigate the measured confounding, we calculate the PS overlap coefficient to account for the between-group heterogeneity by adjusting the hyperparameters of the MAP prior. In the second stage, to reduce unmeasured confounding and address potential prior-data conflict, we construct a data-driven mixture prior incorporating an adaptive weight that dynamically controls the proportion of the MAP prior. To obtain the adaptive weight measuring the extent of congruence between the current and the external data, SPS-SAM prior uses the likelihood ratio test and SPS-CEM prior uses the scaled t-test, respectively. Compared with existing methods, simulations studies and illustrative examples demonstrate the superior features of the proposed methods. Both proposed methods outperform existing methods by yielding smaller bias, greater calibrated power, and achieving accurate, efficient, and robust estimation of the treatment effect.
Facebook
TwitterThis dataset was created by Andy Chamberlain
Facebook
TwitterThe intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).
The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.
The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.
Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).
A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.
National Coverage
The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.
SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.
It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.
The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).
Sample survey data [ssd]
-Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)
WILL CONFIRM LATER!!
OSO LE MEA E LE FAASA...AEA :-)
Mail Questionnaire [mail]
Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.
Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.
Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.
NOT APPLICABLE!!
Facebook
TwitterOur People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.
Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your People data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.
People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).
People Data Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation.
Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment
Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.
Here's the schema of People Data:
person_id
first_name
last_name
age
gender
linkedin_url
twitter_url
facebook_url
city
state
address
zip
zip4
country
delivery_point_bar_code
carrier_route
walk_seuqence_code
fips_state_code
fips_country_code
country_name
latitude
longtiude
address_type
metropolitan_statistical_area
core_based+statistical_area
census_tract
census_block_group
census_block
primary_address
pre_address
streer
post_address
address_suffix
address_secondline
address_abrev
census_median_home_value
home_market_value
property_build+year
property_with_ac
property_with_pool
property_with_water
property_with_sewer
general_home_value
property_fuel_type
year
month
household_id
Census_median_household_income
household_size
marital_status
length+of_residence
number_of_kids
pre_school_kids
single_parents
working_women_in_house_hold
homeowner
children
adults
generations
net_worth
education_level
occupation
education_history
credit_lines
credit_card_user
newly_issued_credit_card_user
credit_range_new
credit_cards
loan_to_value
mortgage_loan2_amount
mortgage_loan_type
mortgage_loan2_type
mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender
mortgage_loan2_lender
mortgage_loan2_ratetype
mortgage_rate
mortgage_loan2_rate
donor
investor
interest
buyer
hobby
personal_email
work_email
devices
phone
employee_title
employee_department
employee_job_function
skills
recent_job_change
company_id
company_name
company_description
technologies_used
office_address
office_city
office_country
office_state
office_zip5
office_zip4
office_carrier_route
office_latitude
office_longitude
office_cbsa_code
office_census_block_group
office_census_tract
office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl
company_linkedinurl
company_twitterurl
company_website
company_fortune_rank
company_government_type
company_headquarters_branch
company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual
company_msa
company_msa_name
company_naics_code
company_naics_description
company_naics_code2
company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4
company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description
company_parent_company
company_parent_company_location
company_public_private
company_subsidiary_company
company_residential_business_code
company_revenue_at_side_code
company_revenue_range
company_revenue
company_sales_volume
company_small_business
company_stock_ticker
company_year_founded
company_minorityowned
company_female_owned_or_operated
company_franchise_code
company_dma
company_dma_name
company_hq_address
company_hq_city
company_hq_duns
company_hq_state
company_hq_zip5
company_hq_zip4
company_sect...
Facebook
TwitterThe latest estimates from the 2010/11 Taking Part adult survey produced by DCMS were released on 30 June 2011 according to the arrangements approved by the UK Statistics Authority.
30 June 2011
**
April 2010 to April 2011
**
National and Regional level data for England.
**
Further analysis of the 2010/11 adult dataset and data for child participation will be published on 18 August 2011.
The latest data from the 2010/11 Taking Part survey provides reliable national estimates of adult engagement with sport, libraries, the arts, heritage and museums & galleries. This release also presents analysis on volunteering and digital participation in our sectors and a look at cycling and swimming proficiency in England. The Taking Part survey is a continuous annual survey of adults and children living in private households in England, and carries the National Statistics badge, meaning that it meets the highest standards of statistical quality.
These spreadsheets contain the data and sample sizes for each sector included in the survey:
The previous Taking Part release was published on 31 March 2011 and can be found online.
This release is published in accordance with the Code of Practice for Official Statistics (2009), as produced by the http://www.statisticsauthority.gov.uk/">UK Statistics Authority (UKSA). The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.
The document below contains a list of Ministers and Officials who have received privileged early access to this release of Taking Part data. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.
The responsible statistician for this release is Neil Wilson. For any queries please contact the Taking Part team on 020 7211 6968 or takingpart@culture.gsi.gov.uk.
Facebook
TwitterThis paper explores the particular issues that COVID-19 has highlighted for Indigenous Peoples focusing on data for governance. Drawing on current global examples, we underscore the inclusion of Indigenous Peoples in COVID-19 activities as the basis of data-related policy recommendations to increase the use of timely, relevant data for decision-making while reducing risk and harms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the drug development for rare disease, the number of treated subjects in the clinical trial is often very small, whereas the number of external controls can be relatively large. There is no clear guidance on choosing an appropriate statistical method to control baseline confounding in this situation. To fill this gap, we conduct extensive simulations to evaluate the performance of commonly used matching and weighting methods as well as the more recently developed targeted maximum likelihood estimation (TMLE) and cardinality matching in small sample settings, mimicking the motivating data from a pediatric rare disease. Among the methods examined, the performance of coarsened exact matching (CEM) and TMLE are relatively robust under various model specifications. CEM is only feasible when the number of controls far exceeds the number of treated, whereas TMLE has better performance with less extreme treatment allocation ratios. Our simulations suggest bootstrap is useful for variance estimation in small samples after matching.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Reference to so-called real-world data is more often made in marketing authorization applications for medicines intended to diagnose, prevent or treat rare diseases compared to more common diseases. We provide granularity on the type and aim of any external data on efficacy aspects from both real-world data sources and external trial data as discussed in regulatory submissions of orphan designated medicinal products in the EU. By quantifying the contribution of external data according to various regulatory characteristics, we aimed at identifying specific opportunities for external data in the field of orphan conditions.Methods: Information on external data in regulatory documents covering 72 orphan designations was extracted. Our sample comprised public assessment reports for approved, refused, or withdrawn applications concluded from 2019–2021 at the European Medicines Agency. Products with an active orphan designation at the time of submission were scrutinized regarding the role of external data on efficacy aspects in the context of marketing authorization applications, or on the criterion of “significant benefit” for the confirmation of the orphan designation at the time of licensing. The reports allowed a broad distinction between clinical development, regulatory decision making, and intended post-approval data collection. We defined three categories of external data, administrative data, structured clinical data, and external trial data (from clinical trials not sponsored by the applicant), and noted whether external data concerned the therapeutic context of the disease or the product under review.Results: While reference to external data with respect to efficacy aspects was included in 63% of the approved medicinal products in the field of rare diseases, 37% of marketing authorization applications were exclusively based on the dedicated clinical development plan for the product under review. Purely administrative data did not play any role in our sample of reports, but clinical data collected in a structured manner (from routine care or clinical research) were often used to inform on the trial design. Two additional recurrent themes for the use of external data were the contextualization of results, especially to confirm the orphan designation at the time of licensing, and reassurance of a large difference in treatment effect size or consistency of effects observed in clinical trials and practice. External data on the product under review were restricted to either active substances already belonging to the standard of care even before authorization or to compassionate use schemes. Furthermore, external data were considered pivotal for marketing authorization only exceptionally and only for active substances already in use within the specific therapeutic indication. Applications for the rarest conditions and those without authorized treatment alternatives were especially prominent with respect to the use of external data from real-world data sources both in the pre- and post-approval setting.Conclusion: Specific opportunities for external data in the setting of marketing authorizations in the field of rare diseases were identified. Ongoing initiatives of fostering systematic data collection are promising steps for a more efficient medicinal product development in the field of rare diseases.
Facebook
TwitterThe dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We conducted an analysis to confirm our observations that only a very small percentage of public research data is hosted in the Institutional Data Repositories, while the vast majority is published in the open domain-specific and generalist data repositories.
For this analysis, we selected 11 institutions, many of which have been our evaluation partners. For each institution, we counted the number of datasets published in their Institutional Data Repository (IDR) and tracked the number of public research datasets hosted in external data repositories via the Data Monitor API. External tracking was based on the corpus of 14+ mln data records checked against the institutional SciVal ID. One institution didn’t have an IDR.
We found out that 10 out of 11 institutions had most of their public research data hosted outside of their institution, where by research data we mean not only datasets, but a broader notion that includes, for example, software.
We will be happy to expand it by adding more institutions upon request.
Note: This is version 2 of the earlier published dataset. The number of datasets published and tracked in the Monash Institutional Data Repository has been updated based on the information provided by the Monash Library. The number of datasets in the NTU Institutional Data Repository now includes datasets only. Dataverses were excluded to avoid double counting.
Facebook
TwitterOur People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.
Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your People data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.
People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).
People Data Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation.
Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment
Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity.
Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.
Here's the schema of People Data:
person_id
first_name
last_name
age
gender
linkedin_url
twitter_url
facebook_url
city
state
address
zip
zip4
country
delivery_point_bar_code
carrier_route
walk_seuqence_code
fips_state_code
fips_country_code
country_name
latitude
longtiude
address_type
metropolitan_statistical_area
core_based+statistical_area
census_tract
census_block_group
census_block
primary_address
pre_address
streer
post_address
address_suffix
address_secondline
address_abrev
census_median_home_value
home_market_value
property_build+year
property_with_ac
property_with_pool
property_with_water
property_with_sewer
general_home_value
property_fuel_type
year
month
household_id
Census_median_household_income
household_size
marital_status
length+of_residence
number_of_kids
pre_school_kids
single_parents
working_women_in_house_hold
homeowner
children
adults
generations
net_worth
education_level
occupation
education_history
credit_lines
credit_card_user
newly_issued_credit_card_user
credit_range_new
credit_cards
loan_to_value
mortgage_loan2_amount
mortgage_loan_type
mortgage_loan2_type
mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender
mortgage_loan2_lender
mortgage_loan2_ratetype
mortgage_rate
mortgage_loan2_rate
donor
investor
interest
buyer
hobby
personal_email
work_email
devices
phone
employee_title
employee_department
employee_job_function
skills
recent_job_change
company_id
company_name
company_description
technologies_used
office_address
office_city
office_country
office_state
office_zip5
office_zip4
office_carrier_route
office_latitude
office_longitude
office_cbsa_code
office_census_block_group
office_census_tract
office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl
company_linkedinurl
company_twitterurl
company_website
company_fortune_rank
company_government_type
company_headquarters_branch
company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual
company_msa
company_msa_name
company_naics_code
company_naics_description
company_naics_code2
company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4
company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description
company_parent_company
company_parent_company_location
company_public_private
company_subsidiary_company
company_residential_business_code
company_revenue_at_side_code
company_revenue_range
company_revenue
company_sales_volume
company_small_business
company_stock_ticker
company_year_founded
company_minorityowned
company_female_owned_or_operated
company_franchise_code
company_dma
company_dma_name
company_hq_address
company_hq_city
company_hq_duns
company_hq_state
company_hq_zip5
company_hq_zip4
company_sec...
Facebook
TwitterOur People data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.
Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.
People Data Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).
People data Use Cases:
360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.
Here's the schema of People Data:
person_id
first_name
last_name
age
gender
linkedin_url
twitter_url
facebook_url
city
state
address
zip
zip4
country
delivery_point_bar_code
carrier_route
walk_seuqence_code
fips_state_code
fips_country_code
country_name
latitude
longtiude
address_type
metropolitan_statistical_area
core_based+statistical_area
census_tract
census_block_group
census_block
primary_address
pre_address
streer
post_address
address_suffix
address_secondline
address_abrev
census_median_home_value
home_market_value
property_build+year
property_with_ac
property_with_pool
property_with_water
property_with_sewer
general_home_value
property_fuel_type
year
month
household_id
Census_median_household_income
household_size
marital_status
length+of_residence
number_of_kids
pre_school_kids
single_parents
working_women_in_house_hold
homeowner
children
adults
generations
net_worth
education_level
occupation
education_history
credit_lines
credit_card_user
newly_issued_credit_card_user
credit_range_new
credit_cards
loan_to_value
mortgage_loan2_amount
mortgage_loan_type
mortgage_loan2_type
mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender
mortgage_loan2_lender
mortgage_loan2_ratetype
mortgage_rate
mortgage_loan2_rate
donor
investor
interest
buyer
hobby
personal_email
work_email
devices
phone
employee_title
employee_department
employee_job_function
skills
recent_job_change
company_id
company_name
company_description
technologies_used
office_address
office_city
office_country
office_state
office_zip5
office_zip4
office_carrier_route
office_latitude
office_longitude
office_cbsa_code
office_census_block_group
office_census_tract
office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl
company_linkedinurl
company_twitterurl
company_website
company_fortune_rank
company_government_type
company_headquarters_branch
company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual
company_msa
company_msa_name
company_naics_code
company_naics_description
company_naics_code2
company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4
company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description
company_parent_company
company_parent_company_location
company_public_private
company_subsidiary_company
company_residential_business_code
company_revenue_at_side_code
company_revenue_range
company_revenue
company_sales_volume
company_small_business
company_stock_ticker
company_year_founded
company_minorityowned
company_female_owned_or_operated
company_franchise_code
company_dma
company_dma_name
company_hq_address
company_hq_city
company_hq_duns
company_hq_state
company_hq_zip5
company_hq_zip4
company_se...
Facebook
TwitterThis dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Facebook
TwitterCensus 2021 was the first census with a digital-first design, encouraging participants to respond online rather than on a paper questionnaire. Support was given to people who could not respond online, including paper questionnaires, telephone contact centres, field force support, and an extended collection period.
Topics covered in the 2021 UK Census included:
The 2021 Census: Safeguarded Household Microdata Sample dataset consists of a random sample of 1% of households from the 2021 Census and contains records for all individuals within these sampled households. It includes records for 263,729 households and 606,210 persons. These data cover England and Wales only. This sample allows linkage between individuals in the same household. The lowest level of geography is Wales and regions within England. It contains 56 variables and a low level of detail. This is a new ONS product following user feedback from the 2011 Census.
Census Microdata
Microdata are small samples of individual records from a single census from which identifying information have been removed. They contain a range of individual and household characteristics and can be used to carry out analysis not possible from standard census outputs, such as:
The microdata samples are designed to protect the confidentiality of individuals and households. This is done by applying access controls and removing information that might directly identify a person, such as names, addresses and date of birth. Record swapping is applied to the census data used to create the microdata samples. This is a statistical disclosure control (SDC) method, which makes very small changes to the data to prevent the identification of individuals. The microdata samples use further SDC methods, such as collapsing variables and restricting detail. The samples also include records that have been edited to prevent inconsistent data and contain imputed persons, households, and data values. To protect confidentiality, imputation flags are not included in any 2021 Census microdata sample.
Facebook
TwitterIn 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.