Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.
Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.
CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.
Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.
Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.
Facebook
TwitterThis is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Transportation and Logistics Tracking Dataset comprises multiple datasets related to various aspects of transportation and logistics operations. It includes information on on-time delivery impact, routes by rating, customer ratings, delivery times with and without congestion, weather conditions, and differences between fixed and main delivery times across different regions.
On-Time Delivery Impact: This dataset provides insights into the impact of on-time delivery, categorizing deliveries based on their impact and counting the occurrences for each category. Routes by Rating: Here, the dataset illustrates the relationship between routes and their corresponding ratings, offering a visual representation of route performance across different rating categories. Customer Ratings and On-Time Delivery: This dataset explores the relationship between customer ratings and on-time delivery, presenting a comparison of delivery counts based on customer ratings and on-time delivery status. Delivery Time with and Without Congestion: It contains information on delivery times in various cities, both with and without congestion, allowing for an analysis of how congestion affects delivery efficiency. Weather Conditions: This dataset provides a summary of weather conditions, including counts for different weather conditions such as partly cloudy, patchy light rain with thunder, and sunny. Difference between Fixed and Main Delivery Times: Lastly, the dataset highlights the differences between fixed and main delivery times across different regions, shedding light on regional variations in delivery schedules. Overall, this dataset offers valuable insights into the transportation and logistics domain, enabling analysis and decision-making to optimize delivery processes and enhance customer satisfaction.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterIn an effort to help combat COVID-19, we created a COVID-19 Public Datasets program to make data more accessible to researchers, data scientists and analysts. The program will host a repository of public datasets that relate to the COVID-19 crisis and make them free to access and analyze. These include datasets from the New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Free hosting and queries of COVID datasets As with all data in the Google Cloud Public Datasets Program , Google pays for storage of datasets in the program. BigQuery also provides free queries over certain COVID-related datasets to support the response to COVID-19. Queries on COVID datasets will not count against the BigQuery sandbox free tier , where you can query up to 1TB free each month. Limitations and duration Queries of COVID data are free. If, during your analysis, you join COVID datasets with non-COVID datasets, the bytes processed in the non-COVID datasets will be counted against the free tier, then charged accordingly, to prevent abuse. Queries of COVID datasets will remain free until Sept 15, 2021. The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices & policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies. See the list of all datasets included in the program
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.
This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains sales transaction data from Blinkit, an online grocery delivery platform. It provides valuable insights into customer purchasing behavior, product demand, revenue trends, and sales performance over time.
This dataset can be beneficial for data scientists, business analysts, and researchers looking to explore e-commerce and retail trends. Feel free to use it for analysis, machine learning models, and business intelligence projects.
Facebook
TwitterIntroducing Job Posting Datasets: Uncover labor market insights!
Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.
Job Posting Datasets Source:
Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.
Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.
StackShare: Access StackShare datasets to make data-driven technology decisions.
Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.
Choose your preferred dataset delivery options for convenience:
Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.
Why Choose Oxylabs Job Posting Datasets:
Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.
Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.
Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.
Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.7910/DVN/ZTMDURhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.7910/DVN/ZTMDUR
The Pilot Analysis of Global Ecosystems (PAGE): Agroecosystems was one of four pilot studies undertaken as precursors to the Millennium Ecosystem Assessment. The study identifies linkages between crop production systems and environmental services such as food, soil resources, water, biodiversity, and carbon cycling, in the hopes that a better understanding of these linkages might lead to policies that can contribute both to improved food output and to improved ecosystem service provision. Th e PAGE Agroecosystems report includes a series of 24 maps that provide a detailed spatial perspective on agroecosystems a nd agroecosystem services. Pilot Analysis of Global Ecosystems (PAGE): Agroecosystems Dataset offers the 9 geospatial datasets used to build these maps. The datasets are: PAGE Global Agricultural Extent. The data describe the location and extent of global agriculture and are derived from GLLCCD 1998; USGS EDC1999a. PAGE Global Agricultural Extent version 2. The data are an update of the original PAGE Global Agricultural Extent, based on version 2 of the Global Land Cover Characteristics Dataset (GLCCD v2.0, USGS/EDC 2000). The methods used to create this dataset were the same as those employed to create the origina l PAGE Global Agricultural Extent. Mask of the Global Extent of Agriculture. This dataset displays the global extent of agricultural areas as defined by the PAGE study. The other datasets made available on this site (eg. tree cover, soil carbon, area free of soil constraints) only show values for areas within this agricultural extent. PAGE Global Agroecosystems. These data characterize agroecosystems, defined as "a biological and natural resource system managed by humans for the primary purpose of producing food as well as other socially valuable nonfood products and environmental services." Percentage Tree Cover within the Extent of Agriculture. This is a raster dataset that shows the proportion of land area within the PAGE agricultural extent that is occupied by "woody vegetation" (mature vegetation whose approximate height is greater than 5 meters). Carbon Storage in Soils within the PAGE Agricultural Extent. The data give a global estimate of soil organic carbon storage in agricultural lands, calculated by applying Batjes' (1996 and 2000) soil organic carbon content values by soil type area share of each 5 x 5 minute of the Digital Soil Map of the World (FAO 1995). Agriculture Share of Watershed. This dataset depicts agricultural area as a share of total watershed area. The share of each watershed that is agricultural was calculated by applying a weighted percentage to each PAGE agricultural land cover class. Area Free of Soil Constraints. The data show the proportional area within the PAGE agricultural extent that is free from soil constraints. The area free of soil constraints is based on fertility capability classification (FCC) app lied to FAO's Digital Soil Map of the World (1995). Outline of Land and Water Area. These data are used to provide a boundary for land areas and facilitate the readability of maps.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Free Soil population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Free Soil across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of Free Soil was 153, a 0.66% increase year-by-year from 2021. Previously, in 2021, Free Soil population was 152, an increase of 0.66% compared to a population of 151 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Free Soil decreased by 23. In this period, the peak population was 179 in the year 2004. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Free Soil Population by Year. You can refer the same here
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
We'll tailor a bespoke airline dataset to meet your unique needs, encompassing flight details, destinations, pricing, passenger reviews, on-time performance, and other pertinent metrics.
Leverage our airline datasets for diverse applications to bolster strategic planning and market analysis. Scrutinizing these datasets enables organizations to grasp traveler preferences and industry trends, facilitating nuanced operational adaptations and marketing initiatives. Customize your access to the entire dataset or specific subsets as per your business requisites.
Popular use cases involve optimizing route profitability, improving passenger satisfaction, and conducting competitor analysis.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/DBW86Thttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/DBW86T
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc). More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). The dataset includes lesions with multiple images, which can be tracked by the lesion_id-column within the HAM10000_metadata file. Due to upload size limitations, images are stored in two files: HAM10000_images_part1.zip (5000 JPEG files) HAM10000_images_part2.zip (5015 JPEG files) Additional data for evaluation purposes The HAM10000 dataset served as the training set for the ISIC 2018 challenge (Task 3), with the same sources contributing the majority of the validation- and test-set as well. The test-set images are available herein as ISIC2018_Task3_Test_Images.zip (1511 images), the ground-truth in the same format as the HAM10000 data (public since 2023) is available as ISIC2018_Task3_Test_GroundTruth.csv.. The ISIC-Archive also provides the challenge images and metadata (training, validation, test) at their "ISIC Challenge Datasets" page. Comparison to physicians Test-set evaluations of the ISIC 2018 challenge were compared to physicians on an international scale, where the majority of challenge participants outperformed expert readers: Tschandl P. et al., Lancet Oncol 2019 Human-computer collaboration The test-set images were also used in a study comparing different methods and scenarios of human-computer collaboration: Tschandl P. et al., Nature Medicine 2020 Following corresponding metadata is available herein: ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.csv: Human ratings for Test images with and without interaction with a ResNet34 CNN (Malignancy Probability, Multi-Class probability, CBIR) or Human-Crowd Multi-Class probabilities. This is data was collected for and analyzed in Tschandl P. et al., Nature Medicine 2020, therefore please refer to this publication when using the data. Some details on the abbreviated column headings: image_id: This is the ISIC image_id of an image at the time of the study. There should be no duplications in the combination image_id & interaction_modality. As not every image was shown with every interaction modality, not every combination is present. prob_m_dx_akiec, ... : m is "machine probabilities". Values are values after softmax, and "_mal" is all malignant classes summed. prob_h_dx_akiec, ... : h is "human probabilities". Values are aggregated percentages of human ratings from past studies distinguishing between seven classes. Note there is no "prob_h_mal" as this was none of the tested interaction modalities. user_dx_without_interaction_akiec, ...: Number of participants choosing this diagnosis without interaction. user_dx_with_interaction_akiec, ...: Number of participants choosing this diagnosis with interaction. HAM10000_segmentations_lesion_tschandl.zip: To evaluate regions of CNN activations in Tschandl P. et al., Nature Medicine 2020 (please refer to this publication when using the data), a single dermatologist (Tschandl P) created binary segmentation masks for all 10015 images from the HAM10000 dataset. Masks were initialized with the segmentation network as described by Tschandl et al., Computers in Biology and Medicine 2019, and following verified, corrected or replaced via the free-hand selection tool in FIJI.
Facebook
TwitterSuccess.ai offers a cutting-edge solution for businesses and organizations seeking Company Financial Data on private and public companies. Our comprehensive database is meticulously crafted to provide verified profiles, including contact details for financial decision-makers such as CFOs, financial analysts, corporate treasurers, and other key stakeholders. This robust dataset is continuously updated and validated using AI technology to ensure accuracy and relevance, empowering businesses to make informed decisions and optimize their financial strategies.
Key Features of Success.ai's Company Financial Data:
Global Coverage: Access data from over 70 million businesses worldwide, including public and private companies across all major industries and regions. Our datasets span 250+ countries, offering extensive reach for your financial analysis and market research.
Detailed Financial Profiles: Gain insights into company financials, including revenue, profit margins, funding rounds, and operational costs. Profiles are enriched with key contact details, including work emails, phone numbers, and physical addresses, ensuring direct access to decision-makers.
Industry-Specific Data: Tailored datasets for sectors such as financial services, manufacturing, technology, healthcare, and energy, among others. Each dataset is customized to meet the unique needs of industry professionals and analysts.
Real-Time Accuracy: With continuous updates powered by AI-driven validation, our financial data maintains a 99% accuracy rate, ensuring you have access to the most reliable and up-to-date information available.
Compliance and Security: All data is collected and processed in strict adherence to global compliance standards, including GDPR, ensuring ethical and lawful usage.
Why Choose Success.ai for Company Financial Data?
Best Price Guarantee: We pride ourselves on offering the most competitive pricing in the industry, ensuring you receive unparalleled value for comprehensive financial data.
AI-Validated Accuracy: Our advanced AI algorithms meticulously verify every data point to ensure precision and reliability, helping you avoid costly errors in your financial decision-making.
Customized Data Solutions: Whether you need data for a specific region, industry, or type of business, we tailor our datasets to align perfectly with your requirements.
Scalable Data Access: From small startups to global enterprises, our platform caters to businesses of all sizes, delivering scalable solutions to suit your operational needs.
Comprehensive Use Cases for Financial Data:
Leverage our detailed financial profiles to create accurate budgets, forecasts, and strategic plans. Gain insights into competitors’ financial health and market positions to make data-driven decisions.
Access key financial details and contact information to streamline your M&A processes. Identify potential acquisition targets or partners with verified profiles and financial data.
Evaluate the financial performance of public and private companies for informed investment decisions. Use our data to identify growth opportunities and assess risk factors.
Enhance your sales outreach by targeting CFOs, financial analysts, and other decision-makers with verified contact details. Utilize accurate email and phone data to increase conversion rates.
Understand market trends and financial benchmarks with our industry-specific datasets. Use the data for competitive analysis, benchmarking, and identifying market gaps.
APIs to Power Your Financial Strategies:
Enrichment API: Integrate real-time updates into your systems with our Enrichment API. Keep your financial data accurate and current to drive dynamic decision-making and maintain a competitive edge.
Lead Generation API: Supercharge your lead generation efforts with access to verified contact details for key financial decision-makers. Perfect for personalized outreach and targeted campaigns.
Tailored Solutions for Industry Professionals:
Financial Services Firms: Gain detailed insights into revenue streams, funding rounds, and operational costs for competitor analysis and client acquisition.
Corporate Finance Teams: Enhance decision-making with precise data on industry trends and benchmarks.
Consulting Firms: Deliver informed recommendations to clients with access to detailed financial datasets and key stakeholder profiles.
Investment Firms: Identify potential investment opportunities with verified data on financial performance and market positioning.
What Sets Success.ai Apart?
Extensive Database: Access detailed financial data for 70M+ companies worldwide, including small businesses, startups, and large corporations.
Ethical Practices: Our data collection and processing methods are fully comp...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the median household income in Madisonville. It can be utilized to understand the trend in median household income and to analyze the income distribution in Madisonville by household type, size, and across various income brackets.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of Madisonville median household income. You can refer the same here
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media. Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII). Data Format: The format of data is:
ID: Numeric unique identifier. Text: Main content. Dimension: Categorical descriptor of the text. Biased_Words: List of words considered biased. Aspect: Specific topic within the text. Label: Bias True/False value Aggregate Label: Calculated through multiple weighted formulae Annotation Scheme: The annotation scheme is based on Active learning, which is Manual Labeling --> Semi-Supervised Learning --> Human Verifications (iterative process)
Bias Label: Indicate the presence/absence of bias (e.g., no bias, mild, strong).
Words/Phrases Level Biases: Identify specific biased words/phrases.
Subjective Bias (Aspect): Capture biases related to content aspects.
List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.
We also utilize publicly available data from the following links. Our Attribution to others.
MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336
Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detection
Toxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.
Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.
Age Bias : DĂaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)
Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtV
Social biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/
Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage. If you use this dataset, please cite us. Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Monster India Latest jobs dataset. Crawl Feeds team extracted a jobs from monster for reasearch and analysis purposes. Last crawled on 29 sep 2021
Facebook
TwitterThis dataset contains race data from the past ten years of NCAA for the 100 freestyle (men) event. I collected this data using my own Python Script in which you follow along with a race by pressing the "Enter" button with each stroke. Upon the completion of the script, csv and pdf files are generated containing data from the race. I aggregated this data for the completion of my first project.
In order to aggregate, organize, and visualize the data, I had to use a variety of software such as BigQuery (SQL), Python, Tableau, and Google Sheets. This project shows my ability to use a variety of different tools used for data analysis.
Facebook
TwitterCursor logs from an anonymized sample of millions of searchers on the Microsoft Bing search engine are collected and analyzed. They also conduct a 1000 participant study to confirm the validity of the population scale results.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the median household income in Daytona Beach. It can be utilized to understand the trend in median household income and to analyze the income distribution in Daytona Beach by household type, size, and across various income brackets.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of Daytona Beach median household income. You can refer the same here
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.
Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.
CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.
Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.
Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.