Facebook
TwitterBytemine offers access to over 100 million verified personal email addresses for US consumers and professionals. This extensive B2C contact database is designed to support modern outreach, digital marketing, lead generation, and customer engagement across channels that reach people where they are most responsive — their personal inbox.
Unlike traditional work email databases that limit outreach to business hours or corporate filters, personal emails enable more flexible, direct, and often higher-converting communication. Whether you're running direct-to-consumer campaigns, re-engaging inactive users, or enriching existing contact records, Bytemine provides the scale and data quality you need to connect effectively.
Our personal email dataset includes:
100 million+ verified personal email addresses (Gmail, Yahoo, Outlook, etc.) Matched with names, phone numbers, location, and demographic attributes 50+ enriched fields including age range, gender, location, occupation, and consumer behavior signals Optional inclusion of job title, company, and professional details for dual B2B-B2C targeting
All emails are verified and regularly updated to ensure deliverability, reduce bounce rates, and improve sender reputation. Contacts are sourced through direct data licensing agreements with consumer platforms, B2C applications, and verified aggregators, ensuring compliance and reliability.
This data is ideal for:
B2C marketing campaigns (email newsletters, promotions, lifecycle emails) Direct-to-consumer product launches and brand activations Customer re-engagement and loyalty campaigns Lookalike audience creation for paid media CRM enrichment with consumer-facing contact info Identity resolution and cross-channel targeting Data onboarding for ad platforms or audience segmentation Consumer surveys, polling, and research
Bytemine’s personal email dataset empowers your marketing, growth, and data teams with clean, structured, and highly scalable contact information. Each record can be enriched with behavioral and demographic data, enabling advanced personalization and segmentation strategies.
Access is available through:
With flexible delivery options and scalable pricing, Bytemine supports startups, growth teams, agencies, and enterprise platforms looking to expand their reach and drive performance with verified consumer data.
If you're looking to power outreach across consumer inboxes, enrich B2C data, or build a scalable, compliant contact database, Bytemine’s personal email dataset is the fastest way to connect with real people across the United States.
Facebook
TwitterI collected the data largely using Open AI.
Celebrity - Their stage name.
Name - Their birth name.
Nationality - Where they were born, using the 2 letter country code standards.
Gender - Their gender.
Estimated Net Worth - This was not gathered using AI. I used Google and if it returned an estimated range like 80 million to 100 million, I chose the lowest amount given, or 80 million in the example.
Age at End of 2023 - Their age on 12/31/23.
Birth Date - Their birthday in mm/dd/yyyy format.
Birth Month - The month they were born in.
Birth Day - The day of the month they were born on.
Birth Year - The year they were Born.
Industry - What Industries they operate in.
What you can analyze:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
**🌍 World Countries Dataset This World Countries Dataset contains detailed information about countries across the globe, offering insights into their geographic, demographic, and economic characteristics.
It includes various features such as population, area, GDP, languages, and regional classifications. This dataset is ideal for projects related to data visualization, statistical analysis, geographical studies, or machine learning applications such as clustering or classification of countries.
This dataset was manually compiled/collected from reliable open data sources (e.g., Wikipedia, World Bank, or other governmental datasets).
**🔍 Sample Questions Explored Using Python: - Q. 1) Which countries have the highest and lowest population? - Q. 2) What is the average area (in sq. km) of countries in each region? - Q. 3) Which countries have more than 100 million population and GDP above $1 trillion? - Q. 4) Which languages are most commonly spoken across countries? - Q. 5) Show a bar graph comparing GDPs of G7 nations. - Q. 6) How many countries are there in each continent or region? - Q. 7) Which countries have both a high population density and low GDP per capita? - Q. 8) Create a world map visualization of population or GDP distribution. - Q. 9) What are the top 10 most densely populated countries? - Q. 10) How many landlocked countries are there in the world?
**🧾 Features / Columns in the Dataset: - Country: The name of the country (e.g., "Pakistan", "France").
Capital: The capital city of the country.
Region: Broad geographical region (e.g., "Asia", "Europe").
Subregion: More specific geographical grouping (e.g., "Southern Asia").
Population: Total population of the country.
Area (sq. km): Total land area in square kilometers.
Population Density: Number of people per square kilometer.
GDP (USD): Gross Domestic Product (in U.S. dollars).
GDP per Capita: GDP divided by the population.
Official Languages: Officially recognized language(s) spoken.
Currency: Name of the currency used.
Timezones: Timezones in which the country falls.
Borders: List of bordering countries (if any).
Landlocked: Whether the country is landlocked (Yes/No).
Latitude / Longitude: Coordinates for geographical plotting.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The baseline sweep of the 100 Million Brazilian Cohort comprises individuals registered in The National Unified Register for Social Programmes (Cadastro Único para Programas Sociais or Cadastro Único) for the first time. Cadastro Único identifies low-income families who have applied for social assistance in Brazil and includes individuals who have applied to receive any social benefit since 2001. The baseline dataset includes 131,697,800 individuals, about 62% of the Brazilian population, who entered at different periods from 2001 to 2018. 55.8% of the cohort identified themselves as Brown, 30.7% as White, 6.6% as Black, 0.6% as Indigenous, and 0.4% as Asian.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CIFAR-10 and CIFAR-100 dataset contains labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
* More info on CIFAR-100: https://www.cs.toronto.edu/~kriz/cifar.html
* TensorFlow listing of the dataset: https://www.tensorflow.org/datasets/catalog/cifar100
* GitHub repo for converting CIFAR-100 tarball files to png format: https://github.com/knjcode/cifar2png
The CIFAR-10 dataset consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images [in the original dataset].
This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). However, this project does not contain the superclasses.
* Superclasses version: https://universe.roboflow.com/popular-benchmarks/cifar100-with-superclasses/
More background on the dataset:
https://i.imgur.com/5w8A0Vm.png" alt="CIFAR-100 Dataset Classes and Superclassees">
train (83.33% of images - 50,000 images) set and test (16.67% of images - 10,000 images) set only.train set split to provide 80% of its images to the training set (approximately 40,000 images) and 20% of its images to the validation set (approximately 10,000 images)@TECHREPORT{Krizhevsky09learningmultiple,
author = {Alex Krizhevsky},
title = {Learning multiple layers of features from tiny images},
institution = {},
year = {2009}
}
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This is the dataset for the study of "Social dilemma in the excess use of antimicrobials incurring antimicrobial resistance". The emergence of antimicrobial resistance (AMR) caused by the excess use of antimicrobials has come to be recognized as a global threat to public health. There is a ‘tragedy of the commons’ type social dilemma behind this excessive use of antimicrobials, which should be recognized by all stakeholders. To address this global threat, we thus surveyed eight countries/areas to determine whether people recognize this dilemma and showed that although more than half of the population pays little, if any, attention to it, almost 20% recognize this social dilemma, and 15–30% of those have a positive attitude toward solving that dilemma. We suspect that increasing individual awareness of this social dilemma contributes to decreasing the frequency of AMR emergencies. Methods We designed a questionnaire to observe a social dilemma in the excess use of antimicrobials incurring antimicrobial resistance by placing two types of imaginary artificial-intelligence (AI) physicians who perform medical practice from either an individual or societal perspective. We assume two AI medical diagnosis systems: “Individual precedence AI” (abbreviated Individual-AI) and “World precedence AI” (abbreviated World-AI). Both AIs diagnose and prescribe medicine automatically. The Individual-AI system diagnoses patients and prescribes medicine to prevent infections based on an individual perspective, including all prophylactic prescriptions against rare accidental infections (not yet present and unlikely to occur). It does not consider the global risk of AMR in the decision. The World-AI system, instead, takes into account the global mortality rate of AMR, aiming to reduce the total number of all AMR-related deaths. Because of this, this AI system does not prescribe antimicrobials against rare and not-yet-present infections. This questionnaire design allows us to observe the social dilemma. For example, it shows a typical social dilemma caused by preferring the use of Individual-AI for diagnosing oneself but preferring the use of World-AI for diagnosing strangers.
The survey entitled “Survey on Medical Advancement” was administered to 8 countries/areas. The survey was conducted 4 times. For the two surveys in Japan, an internet survey company, Cross Marketing Inc. (https://www.cross-m.co.jp/en/), created the questionnaire webpages based on our study design. The company also collected the data. As of April 2020, Cross Marketing Inc. has 4.79 million people in an active panel (survey participants who registered in advance). Here, the definition of an active panel is a survey respondent who has been active within the last year. For the panels, the questionnaire and response column were displayed on the website through which the respondents could complete and submit their responses. We extracted 500 submissions for each gender and each age group by random sampling from all samples collected during the survey periods. The surveys in the 7 countries/areas (i.e., the United States, the United Kingdom, Sweden, Taiwan, Australia, Brazil, and Russia) are conducted by Cint (https://www.cint.com/). Cint is the world’s largest consumer network for digital survey-based research. The headquarters of the company is in Sweden. Cint maintains a survey platform that contained more than 100 million consumer monitors in over 80 countries as of May 2020. For surveys in the US, UK, Sweden, Taiwan, Australia, Brazil, and Russia, Cint Japan (https://jp.cint.com/), which is the Japanese distributor of Cint, created translated questionnaire webpages based on our study design. The company also collected the data. We extracted at least 500 (US, UK, SWE, BRA, RUS) or 250 (TWN, AUS) submissions for each gender (male and female) and each age group (20 s, 30 s, 40 s, 50 s, and 60 s) by random sampling from all samples collected between survey periods. Note that both companies eliminated inconsistent or apathetic respondents. For example, respondents with inconsistent responses (e.g., the registered age of the respondent differed from the reported age at the time of the survey.) were eliminated before reaching the authors. In addition, respondents with significantly short response times (i.e., shorter than 1 min) were eliminated because they may not have read the questions carefully.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Money Supply M0 in the United States increased to 53615000 USD Million in October from 5478000 USD Million in September of 2025. This dataset provides - United States Money Supply M0 - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Facebook
TwitterThe dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The economic factors present in this dataset include data items of gross domestic product (GDP) (100 million), per-capita GDP (yuan/people), primary industry (100 million), secondary industry (100 million), tertiary industry (100 million) and total investment in fixed assets (100 million). Time serial data from 1949 to 2013 of whole China and all the provinces are included. All of data were collected from the China Statistical Yearbook from 1981 to 2014 and China Compendium of Statistics from 1949 to 2008.These data are not intended for demarcation.
Facebook
TwitterSuccess.ai’s User Profiles Data for Nonprofit and NGO Leaders provides businesses, organizations, and researchers with comprehensive access to global leaders in the nonprofit and NGO sectors. With data sourced from over 700 million verified LinkedIn profiles, this dataset includes actionable insights and contact details for executives, program managers, administrators, and decision-makers. Whether your goal is to partner with nonprofits, support global causes, or conduct research into social impact, Success.ai ensures your outreach is backed by accurate, enriched, and continuously updated data.
Why Choose Success.ai’s User Profiles Data for Nonprofit and NGO Leaders? Comprehensive Professional Profiles
Access verified LinkedIn profiles of nonprofit leaders, NGO managers, program directors, grant writers, and administrative executives. AI-driven validation ensures 99% accuracy for efficient communication and minimized bounce rates. Global Coverage Across Nonprofit Sectors
Includes profiles from nonprofits, humanitarian organizations, environmental groups, social enterprises, and advocacy organizations. Covers key markets across North America, Europe, APAC, South America, and Africa for global reach. Continuously Updated Dataset
Reflects real-time professional updates, organizational changes, and emerging trends in the nonprofit landscape to keep your targeting relevant and effective. Tailored for Nonprofit Insights
Enriched profiles include work histories, organizational affiliations, areas of expertise, and social impact projects for deeper engagement opportunities. Data Highlights: 700M+ Verified LinkedIn Profiles: Access a vast network of nonprofit and NGO professionals worldwide. 100M+ Work Emails: Direct communication with executives, managers, and decision-makers in the nonprofit sector. Enriched Organizational Data: Gain insights into leadership structures, mission focuses, and operational scales. Industry-Specific Segmentation: Target nonprofits focused on healthcare, education, environmental sustainability, human rights, and more. Key Features of the Dataset: Nonprofit and NGO Leader Profiles
Identify and connect with executives, program managers, fundraisers, and policy directors in global nonprofit and NGO sectors. Engage with individuals who drive decision-making and operational strategies for impactful organizations. Detailed Organizational Insights
Leverage firmographic data, including organizational size, mission, regional activity, and funding sources, to align with specific nonprofit goals. Advanced Filters for Precision Targeting
Refine searches by region, mission type, role, or organizational focus for tailored outreach. Customize campaigns based on social impact priorities, such as climate action, gender equality, or economic development. AI-Driven Enrichment
Enhanced datasets provide actionable insights into professional accomplishments, partnerships, and leadership achievements for targeted engagement. Strategic Use Cases: Partnership Development and Outreach
Identify nonprofits and NGOs for collaboration on social impact projects, sponsorships, or grant distribution. Build relationships with decision-makers driving advocacy, fundraising, and community initiatives. Donor Engagement and Fundraising
Target nonprofit leaders responsible for managing fundraising campaigns and donor relationships. Tailor outreach efforts to align with specific causes and funding priorities. Research and Analysis
Analyze leadership trends, mission focuses, and regional nonprofit activities to inform program design and funding strategies. Use insights to evaluate the effectiveness of social impact initiatives and partnerships. Recruitment and Talent Acquisition
Target HR professionals and administrators seeking qualified staff, consultants, or volunteers for nonprofits and NGOs. Offer talent solutions for specialized roles in program management, advocacy, and administration. Why Choose Success.ai? Best Price Guarantee
Access industry-leading, verified User Profiles Data at unmatched pricing to ensure your campaigns are cost-effective and impactful. Seamless Integration
Easily integrate verified nonprofit data into your CRM or marketing platforms with APIs or downloadable formats. AI-Validated Accuracy
Rely on 99% accuracy to minimize wasted outreach efforts and maximize engagement outcomes. Customizable Solutions
Tailor datasets to focus on specific nonprofit types, geographical regions, or areas of social impact to meet your strategic objectives. Strategic APIs for Enhanced Campaigns: Data Enrichment API
Update your internal records with verified nonprofit leader profiles to enhance targeting and engagement. Lead Generation API
Automate lead generation for a consistent pipeline of nonprofit and NGO professionals, scaling your outreach efforts efficiently. Success.ai’s User Profiles Data for Nonprofit and NGO Leader...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/
Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:
Over 8 million 311 service requests from 2012-2016
More than 1 million motor vehicle collisions 2012-present
Citi Bike stations and 30 million Citi Bike trips 2013-present
Over 1 billion Yellow and Green Taxi rides from 2009-present
Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015
This dataset is deprecated and not being updated.
Fork this kernel to get started with this dataset.
https://opendata.cityofnewyork.us/
This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.
The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.
Banner Photo by @bicadmedia from Unplash.
On which New York City streets are you most likely to find a loud party?
Can you find the Virginia Pines in New York City?
Where was the only collision caused by an animal that injured a cyclist?
What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here">
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
Facebook
TwitterSuccess.ai’s Education Industry Data provides access to comprehensive profiles of global professionals in the education sector. Sourced from over 700 million verified LinkedIn profiles, this dataset includes actionable insights and verified contact details for teachers, school administrators, university leaders, and other decision-makers. Whether your goal is to collaborate with educational institutions, market innovative solutions, or recruit top talent, Success.ai ensures your efforts are supported by accurate, enriched, and continuously updated data.
Why Choose Success.ai’s Education Industry Data? 1. Comprehensive Professional Profiles Access verified LinkedIn profiles of teachers, school principals, university administrators, curriculum developers, and education consultants. AI-validated profiles ensure 99% accuracy, reducing bounce rates and enabling effective communication. 2. Global Coverage Across Education Sectors Includes professionals from public schools, private institutions, higher education, and educational NGOs. Covers markets across North America, Europe, APAC, South America, and Africa for a truly global reach. 3. Continuously Updated Dataset Real-time updates reflect changes in roles, organizations, and industry trends, ensuring your outreach remains relevant and effective. 4. Tailored for Educational Insights Enriched profiles include work histories, academic expertise, subject specializations, and leadership roles for a deeper understanding of the education sector.
Data Highlights: 700M+ Verified LinkedIn Profiles: Access a global network of education professionals. 100M+ Work Emails: Direct communication with teachers, administrators, and decision-makers. Enriched Professional Histories: Gain insights into career trajectories, institutional affiliations, and areas of expertise. Industry-Specific Segmentation: Target professionals in K-12 education, higher education, vocational training, and educational technology.
Key Features of the Dataset: 1. Education Sector Profiles Identify and connect with teachers, professors, academic deans, school counselors, and education technologists. Engage with individuals shaping curricula, institutional policies, and student success initiatives. 2. Detailed Institutional Insights Leverage data on school sizes, student demographics, geographic locations, and areas of focus. Tailor outreach to align with institutional goals and challenges. 3. Advanced Filters for Precision Targeting Refine searches by region, subject specialty, institution type, or leadership role. Customize campaigns to address specific needs, such as professional development or technology adoption. 4. AI-Driven Enrichment Enhanced datasets include actionable details for personalized messaging and targeted engagement. Highlight educational milestones, professional certifications, and key achievements.
Strategic Use Cases: 1. Product Marketing and Outreach Promote educational technology, learning platforms, or training resources to teachers and administrators. Engage with decision-makers driving procurement and curriculum development. 2. Collaboration and Partnerships Identify institutions for collaborations on research, workshops, or pilot programs. Build relationships with educators and administrators passionate about innovative teaching methods. 3. Talent Acquisition and Recruitment Target HR professionals and academic leaders seeking faculty, administrative staff, or educational consultants. Support hiring efforts for institutions looking to attract top talent in the education sector. 4. Market Research and Strategy Analyze trends in education systems, curriculum development, and technology integration to inform business decisions. Use insights to adapt products and services to evolving educational needs.
Why Choose Success.ai? 1. Best Price Guarantee Access industry-leading Education Industry Data at unmatched pricing for cost-effective campaigns and strategies. 2. Seamless Integration Easily integrate verified data into CRMs, recruitment platforms, or marketing systems using downloadable formats or APIs. 3. AI-Validated Accuracy Depend on 99% accurate data to reduce wasted outreach and maximize engagement rates. 4. Customizable Solutions Tailor datasets to specific educational fields, geographic regions, or institutional types to meet your objectives.
Strategic APIs for Enhanced Campaigns: 1. Data Enrichment API Enrich existing records with verified education professional profiles to enhance engagement and targeting. 2. Lead Generation API Automate lead generation for a consistent pipeline of qualified professionals in the education sector. Success.ai’s Education Industry Data enables you to connect with educators, administrators, and decision-makers transforming global...
Facebook
TwitterSuccess.ai’s Fashion & Apparel Data for Apparel, Fashion & Luxury Goods Professionals in Asia provides a robust dataset tailored for businesses seeking to connect with key players in Asia’s thriving fashion and luxury goods industries. Covering roles such as brand managers, designers, retail executives, and supply chain leaders, this dataset includes verified contact details, professional insights, and actionable business data.
With access to over 700 million verified global profiles and 130 million profiles focused on Asia, Success.ai ensures your outreach, marketing, and business development strategies are supported by accurate, continuously updated, and AI-validated data. Backed by our Best Price Guarantee, this solution positions you to succeed in Asia’s competitive and ever-growing fashion markets.
Why Choose Success.ai’s Fashion & Apparel Data?
Verified Contact Data for Precision Outreach
Comprehensive Coverage of Asian Fashion Professionals
Continuously Updated Datasets
Ethical and Compliant
Data Highlights:
Key Features of the Dataset:
Comprehensive Professional Profiles
Advanced Filters for Precision Campaigns
Industry and Regional Insights
AI-Driven Enrichment
Strategic Use Cases:
Marketing Campaigns and Brand Expansion
Product Development and Consumer Insights
Partnership Development and Retail Collaboration
Market Research and Competitive Analysis
Why Choose Success.ai?
Best Price Guarantee
Seamless Integration
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
VQA understands a provided image and if a person asks question about this, it provides an answer after analyzing (or reasoning) the image via natural language.
As part of T-Brain’s projects on social value, KVQA dataset, a Korean version of VQA dataset was created. KVQA dataset consists of photos taken by Korean visually impaired people, questions about the photos, and 10 answers from 10 distinct annotators for each question. Currently, it consists of 30,000 sets of images and questions, and 300,000 answers, but by the end of this year, we will increase the dataset size to 100,000 sets of images and questions, and 1 million answers. This dataset can be used only for educational and research purposes. Please refer to the attached license for more details. We hope that the KVQA dataset can simultaneously provide opportunities for the development of Korean VQA technology as well as creation of meaningful social value in Korean society.
You can download KVQA dataset via this link.
We measure the model's accuracy by using answers collected from 10 different people for each question. If the answer provided by a VQA model is equal to 3 or more answers from 10 annotators, it gets 100%; if less than 3, it gets a partial score proportionately. To be consistent with ‘human accuracies’, measured accuracies are averaged over all 10 choose 9 sets of human annotators. Please refer to VQA Evaluation which we follow.
from datasets import load_dataset
raw_datasets = load_dataset(
"kvqa.py",
"default",
cache_dir="huggingface_datasets",
data_dir="data",
ignore_verifications=True,
)
dataset_train = raw_datasets["train"]
for item in dataset_train:
print(item)
exit()
| Overall (%) | Yes/no (%) | Number (%) | Etc (%) | Unanswerable (%) | |
|---|---|---|---|---|---|
| # images | 100,445 (100) | 6,124 (6.10) | 9,332 (9.29) | 69,069 (68.76) | 15,920 (15.85) |
| # questions | 100,445 (100) | 6,124 (6.10) | 9,332 (9.29) | 69,069 (68.76) | 15,920 (15.85) |
| # answers | 1,004,450 (100) | 61,240 (6.10) | 93,320 (9.29) | 690,690 (68.76) | 159,200 (15.85) |
| Name | Type | Description |
|---|---|---|
| VQA | [dict] | list of dict holding VQA data |
| +- image | str | filename of image |
| +- source | str | data source `["kvqa" |
| +- answers | [dict] | list of dict holding 10 answers |
| +--- answer | str | answer in string |
| +--- answer_confidence | str | `["yes" |
| +- question | str | question about the image |
| +- answerable | int | answerable? `[0 |
| +- answer_type | str | answer type `["number" |
[{
"image": "KVQA_190712_00143.jpg",
"source": "kvqa",
"answers": [{
"answer": "피아노",
"answer_confidence": "yes"
}, {
"answer": "피아노",
"answer_confidence": "yes"
}, {
"answer": "피아노 치고있다",
"answer_confidence": "maybe"
}, {
"answer": "unanswerable",
"answer_confidence": "maybe"
}, {
"answer": "게임",
"answer_confidence": "maybe"
}, {
"answer": "피아노 앞에서 무언가를 보고 있음",
"answer_confidence": "maybe"
}, {
"answer": "피아노치고있어",
"answer_confidence": "maybe"
}, {
"answer": "피아노치고있어요",
"answer_confidence": "maybe"
}, {
"answer": "피아노 연주",
"answer_confidence": "maybe"
}, {
"answer": "피아노 치기",
"answer_confidence": "yes"
}],
"question": "방에 있는 사람은 지금 뭘하고 있지?",
"answerable": 1,
"answer_type": "other"
},
{
"image": "VizWiz_train_000000008148.jpg",
"source": "vizwiz",
"answers": [{
"answer": "리모컨",
"answer_confidence": "yes"
}, {
"answer": "리모컨",
"answer_confidence": "yes"
}, {
"answer": "리모컨",
"answer_confidence": "yes"
}, {
"answer": "티비 리모컨",
"answer_confidence": "yes"
}, {
"answer": "리모컨",
"answer_confidence": "yes"
}, {
"answer": "리모컨",
"answer_confidence": "yes"
}, {
"answer": "리모컨",
"answer_confidence": "yes"
}, {
"answer": "리모컨",
"answer_confidence": "maybe"
}, {
"answer": "리모컨",
"answer_confidence": "yes"
}, {
"answer": "리모컨",
"answer_confidence": "yes"
}],
"question": "이것은 무엇인가요?",
"answerable": 1,
"answer_type": "other"
}
]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Approximately, 21 million people worldwide could be affected by river floods on average each year, and the 15 countries with the most people exposed, including India, Bangladesh, China, Vietnam, Pakistan, Indonesia, Egypt, Myanmar, Afghanistan, Nigeria, Brazil, Thailand, Democratic Republic of Congo, Iraq, and Cambodia, account for nearly 80 percent of the total population affected in an average year. Summary The Aqueduct Global Flood Risk Country Ranking ranks 163 countries by their current annual average population affected by river floods using the Aqueduct Global Flood Analyzer. Approximately, 21 million people worldwide could be affected by river floods on average each year, and the 15 countries with the most people exposed, including India, Bangladesh, China, Vietnam, Pakistan, Indonesia, Egypt, Myanmar, Afghanistan, Nigeria, Brazil, Thailand, Democratic Republic of Congo, Iraq, and Cambodia, account for nearly 80 percent of the total population affected in an average year. A country-wide estimated average flood protection level was given to each country based on its income level. Cautions Assumption: We assigned a country-wide average flood protection level for each country based on its income level (World Bank). 1) For low-income countries, we assume 10-year flood protection; 2) for lower-middle income countries, we assume 25-year flood protection; 3) for upper-middle income countries, we assume 50-year flood protection; 4) for high-income countries, we assume 100-year flood protection; and 5) for the Netherlands, we assume a 1000-year flood protection. Citation
Facebook
TwitterThe global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 7 pre-generated CSV files with realistic synthetic person records, ranging from 10,000 to 10,000,000 records. Perfect for development, testing, prototyping, and data analysis workflows without privacy concerns.
Each CSV file contains complete demographic information: - person_id: Unique identifier - firstname, lastname: Realistic names (international) - gender, age: Demographics - street, streetnumber, address_unit, postalcode, city: Complete addresses - phone: Realistic phone numbers - email: Valid email addresses
✓ No privacy concerns—completely synthetic data ✓ Perfect for database testing and imports ✓ Ideal for ML model training and prototyping ✓ Ready-to-use CSV format ✓ Multiple sizes for different use cases
License: CC BY 4.0 (Please attribute to Swain / Swainlabs when sharing)
Facebook
TwitterAlesco Phone ID: Your Comprehensive Identity Graph Solution
In today's complex data landscape, having a clear and accurate view of your customers is essential. Alesco Phone ID provides the foundation for building a robust Identity Graph that delivers unparalleled insights. Our database is a rich source of Identity Data, including Phone Number Data / Telemarketing Data, that enables you to connect with your audience more effectively.
At the heart of our solution is Identity Linkage Data. By combining advanced data matching techniques with a vast array of public and private data sources, we create a powerful Identity Graph that links Phone Number Data to real people. This enables you to build detailed customer profiles, identify new opportunities, and optimize your marketing campaigns.
With over 860 million Phone Number Data points, including landlines, mobiles, and VoIP, our database offers unmatched coverage. Our proprietary technology processes an impressive 100 million phone signals daily, ensuring data accuracy and freshness. This continuous validation process guarantees that your Identity Graph is always up-to-date.
To provide maximum flexibility, we offer our Phone ID database as an on-premise solution. This gives you complete control over your Identity Data and allows you to integrate it seamlessly into your existing systems.
By leveraging Alesco Phone ID, you can:
Enhance your customer understanding through a robust Identity Graph Improve campaign targeting and personalization with precise Phone Number Data Optimize your Telemarketing efforts with accurate contact information Strengthen fraud prevention and identity verification with reliable Identity Linkage Data
Ready to elevate your data strategy? Contact Alesco today to learn how our Phone ID database can be the cornerstone of your Identity Graph solution.
Facebook
TwitterBy Valtteri Kurkela [source]
The dataset is constantly updated and synced hourly to ensure up-to-date information. With over several columns available for analysis and exploration purposes, users can extract valuable insights from this extensive dataset.
Some of the key metrics covered in the dataset include:
Vaccinations: The dataset covers total vaccinations administered worldwide as well as breakdowns of people vaccinated per hundred people and fully vaccinated individuals per hundred people.
Testing & Positivity: Information on total tests conducted along with new tests conducted per thousand people is provided. Additionally, details on positive rate (percentage of positive Covid-19 tests out of all conducted) are included.
Hospital & ICU: Data on ICU patients and hospital patients are available along with corresponding figures normalized per million people. Weekly admissions to intensive care units and hospitals are also provided.
Confirmed Cases: The number of confirmed Covid-19 cases globally is captured in both absolute numbers as well as normalized values representing cases per million people.
5.Confirmed Deaths: Total confirmed deaths due to Covid-19 worldwide are provided with figures adjusted for population size (total deaths per million).
6.Reproduction Rate: The estimated reproduction rate (R) indicates the contagiousness of the virus within a particular country or region.
7.Policy Responses: Besides healthcare-related metrics, this comprehensive dataset includes policy responses implemented by countries or regions such as lockdown measures or travel restrictions.
8.Other Variables of InterestThe data encompasses various socioeconomic factors that may influence Covid-19 outcomes including population density,membership in a continent,gross domestic product(GDP)per capita;
For demographic factors: -Age Structure : percentage populations aged 65 and older,aged (70)older,median age -Gender-specific factors: Percentage of female smokers -Lifestyle-related factors: Diabetes prevalence rate and extreme poverty rate
- Excess Mortality: The dataset further provides insights into excess mortality rates, indicating the percentage increase in deaths above the expected number based on historical data.
The dataset consists of numerous columns providing specific information for analysis, such as ISO code for countries/regions, location names,and units of measurement for different parameters.
Overall,this dataset serves as a valuable resource for researchers, analysts, and policymakers seeking to explore various aspects related to Covid-19
Introduction:
Understanding the Basic Structure:
- The dataset consists of various columns containing different data related to vaccinations, testing, hospitalization, cases, deaths, policy responses, and other key variables.
- Each row represents data for a specific country or region at a certain point in time.
Selecting Desired Columns:
- Identify the specific columns that are relevant to your analysis or research needs.
- Some important columns include population, total cases, total deaths, new cases per million people, and vaccination-related metrics.
Filtering Data:
- Use filters based on specific conditions such as date ranges or continents to focus on relevant subsets of data.
- This can help you analyze trends over time or compare data between different regions.
Analyzing Vaccination Metrics:
- Explore variables like total_vaccinations, people_vaccinated, and people_fully_vaccinated to assess vaccination coverage in different countries.
- Calculate metrics such as people_vaccinated_per_hundred or total_boosters_per_hundred for standardized comparisons across populations.
Investigating Testing Information:
- Examine columns such as total_tests, new_tests, and tests_per_case to understand testing efforts in various countries.
- Calculate rates like tests_per_case to assess testing efficiency or identify changes in testing strategies over time.
Exploring Hospitalization and ICU Data:
- Analyze variables like hosp_patients, icu_patients, and hospital_beds_per_thousand to understand healthcare systems' strain.
- Calculate rates like icu_patients_per_million or hosp_patients_per_million for cross-country comparisons.
Assessing Covid-19 Cases and Deaths:
- Analyze variables like total_cases, new_ca...
Facebook
TwitterSuccess.ai’s Phone Number Data offers direct access to over 50 million verified phone numbers for professionals worldwide, extracted from our expansive collection of 170 million profiles. This robust dataset includes work emails and key decision-maker profiles, making it an essential resource for companies aiming to enhance their communication strategies and outreach efficiency. Whether you're launching targeted marketing campaigns, setting up sales calls, or conducting market research, our phone number data ensures you're connected to the right professionals at the right time.
Why Choose Success.ai’s Phone Number Data?
Direct Communication: Reach out directly to professionals with verified phone numbers and work emails, ensuring your message gets to the right person without delay. Global Coverage: Our data spans across continents, providing phone numbers for professionals in North America, Europe, APAC, and emerging markets. Continuously Updated: We regularly refresh our dataset to maintain accuracy and relevance, reflecting changes like promotions, company moves, or industry shifts. Comprehensive Data Points:
Verified Phone Numbers: Direct lines and mobile numbers of professionals across various industries. Work Emails: Reliable email addresses to complement phone communications. Professional Profiles: Decision-makers’ profiles including job titles, company details, and industry information. Flexible Delivery and Integration: Success.ai offers this dataset in various formats suitable for seamless integration into your CRM or sales platform. Whether you prefer API access for real-time data retrieval or static files for periodic updates, we tailor the delivery to meet your operational needs.
Competitive Pricing with Best Price Guarantee: We provide this essential data at the most competitive prices in the industry, ensuring you receive the best value for your investment. Our best price guarantee means you can trust that you are getting the highest quality data at the lowest possible cost.
Targeted Applications for Phone Number Data:
Sales and Telemarketing: Enhance your telemarketing campaigns by reaching out directly to potential customers, bypassing gatekeepers. Market Research: Conduct surveys and research directly with industry professionals to gather insights that can shape your business strategy. Event Promotion: Invite prospects to webinars, conferences, and seminars directly through personal calls or SMS. Customer Support: Improve customer service by integrating accurate contact information into your support systems. Quality Assurance and Compliance:
Data Accuracy: Our data is verified for accuracy to ensure over 99% deliverability rates. Compliance: Fully compliant with GDPR and other international data protection regulations, allowing you to use the data with confidence globally. Customization and Support:
Tailored Data Solutions: Customize the data according to geographic, industry-specific, or job role filters to match your unique business needs. Dedicated Support: Our team is on hand to assist with data integration, usage, and any questions you may have. Start with Success.ai Today: Engage with Success.ai to leverage our Phone Number Data and connect with global professionals effectively. Schedule a consultation or request a sample through our dedicated client portal and begin transforming your outreach and communication strategies today.
Remember, with Success.ai, you don’t just buy data; you invest in a partnership that grows with your business needs, backed by our commitment to quality and affordability.
Facebook
TwitterBytemine offers access to over 100 million verified personal email addresses for US consumers and professionals. This extensive B2C contact database is designed to support modern outreach, digital marketing, lead generation, and customer engagement across channels that reach people where they are most responsive — their personal inbox.
Unlike traditional work email databases that limit outreach to business hours or corporate filters, personal emails enable more flexible, direct, and often higher-converting communication. Whether you're running direct-to-consumer campaigns, re-engaging inactive users, or enriching existing contact records, Bytemine provides the scale and data quality you need to connect effectively.
Our personal email dataset includes:
100 million+ verified personal email addresses (Gmail, Yahoo, Outlook, etc.) Matched with names, phone numbers, location, and demographic attributes 50+ enriched fields including age range, gender, location, occupation, and consumer behavior signals Optional inclusion of job title, company, and professional details for dual B2B-B2C targeting
All emails are verified and regularly updated to ensure deliverability, reduce bounce rates, and improve sender reputation. Contacts are sourced through direct data licensing agreements with consumer platforms, B2C applications, and verified aggregators, ensuring compliance and reliability.
This data is ideal for:
B2C marketing campaigns (email newsletters, promotions, lifecycle emails) Direct-to-consumer product launches and brand activations Customer re-engagement and loyalty campaigns Lookalike audience creation for paid media CRM enrichment with consumer-facing contact info Identity resolution and cross-channel targeting Data onboarding for ad platforms or audience segmentation Consumer surveys, polling, and research
Bytemine’s personal email dataset empowers your marketing, growth, and data teams with clean, structured, and highly scalable contact information. Each record can be enriched with behavioral and demographic data, enabling advanced personalization and segmentation strategies.
Access is available through:
With flexible delivery options and scalable pricing, Bytemine supports startups, growth teams, agencies, and enterprise platforms looking to expand their reach and drive performance with verified consumer data.
If you're looking to power outreach across consumer inboxes, enrich B2C data, or build a scalable, compliant contact database, Bytemine’s personal email dataset is the fastest way to connect with real people across the United States.