100+ datasets found

D
Public Dataset Access and Usage
data.sfgov.org
s.cnmilf.com
+2more
csv, xlsx, xml
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Public Dataset Access and Usage [Dataset]. https://data.sfgov.org/City-Infrastructure/Public-Dataset-Access-and-Usage/su99-qvi4
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Dec 2, 2025
Description
A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).

B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.

C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.

D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.

Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.
About COVID-19 Public Datasets
console.cloud.google.com
Updated Jun 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=ko (2022). About COVID-19 Public Datasets [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-public-data-program?hl=ko
Explore at:
Dataset updated
Jun 19, 2022
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
In an effort to help combat COVID-19, we created a COVID-19 Public Datasets program to make data more accessible to researchers, data scientists and analysts. The program will host a repository of public datasets that relate to the COVID-19 crisis and make them free to access and analyze. These include datasets from the New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Free hosting and queries of COVID datasets As with all data in the Google Cloud Public Datasets Program , Google pays for storage of datasets in the program. BigQuery also provides free queries over certain COVID-related datasets to support the response to COVID-19. Queries on COVID datasets will not count against the BigQuery sandbox free tier , where you can query up to 1TB free each month. Limitations and duration Queries of COVID data are free. If, during your analysis, you join COVID datasets with non-COVID datasets, the bytes processed in the non-COVID datasets will be counted against the free tier, then charged accordingly, to prevent abuse. Queries of COVID datasets will remain free until Sept 15, 2021. The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices & policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies. See the list of all datasets included in the program
r
mirrorCheck results for 4 public datasets
researchdata.edu.au
bridges.monash.edu
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katherine Scull (2025). mirrorCheck results for 4 public datasets [Dataset]. http://doi.org/10.26180/27289017.V1
Explore at:
Unique identifier
https://doi.org/10.26180/27289017.V1
Dataset updated
Jul 4, 2025
Dataset provided by
Monash University
Authors
Katherine Scull
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each zipped folder contains results files from reanalysis of public data in our publication, "mirrorCheck: an R package facilitating informed use of DESeq2’s lfcShrink() function for differential gene expression analysis of clinical samples" (see also the Collection description).
These files were produced by rendering the Quarto documents provided in the supplementary data with the publication (one per dataset). The Quarto codes for the 3 main analyses (COVID, BRCA and Cell line datasets) performed differential gene expression (DGE) analysis using both DESeq2 with lfcShrink() via our R package mirrorCheck, and also edgeR. Each zipped folder here contains 2 folders, one for each DGE analysis. Since DESeq2 was run on data without prior data cleaning, with prefiltering or after Surrogate Variable Analysis, the 'mirrorCheck output' folders themselves contain 3 sub-folders titled 'DESeq_noclean', 'DESeq_prefilt' and 'DESeq_sva". The COVID dataset also has a folder with results from Gene Set Enrichment Analysis. Finally, the fourth folder contains results from a tutorial/vignette-style supplementary file using the Bioconductor "parathyroidSE" dataset. This analysis only utilised DESeq2, with both data cleaning methods and testing two different design formulae, resulting in 5 sub-folders in the zipped folder.
Ice cream sales analysis - temperature and weather
kaggle.com
zip
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salil Tirodkar (2025). Ice cream sales analysis - temperature and weather [Dataset]. https://www.kaggle.com/datasets/saliltirodkar/ice-cream-sales-analysis-temperature-and-weather
Explore at:
zip(514 bytes)Available download formats
Dataset updated
Mar 13, 2025
Authors
Salil Tirodkar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Look at the table given in the spreadsheet, and see if there's a correlation between temperature and revenue with respect to weather in ice cream sales. Eventually, I did see the pattern: higher temperatures usually meant more revenue, which seems realistic, and if it rains the ice cream sales decrease drastically. However, I wanted to dig further into the data and perform a deeper analysis using a visualization, with the help of regression analysis
m
COVID-19 Combined Data-set with Improved Measurement Errors
data.mendeley.com
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.3
Dataset updated
May 13, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
North Carolina Social and Human Services Dataset
kaggle.com
zip
Updated May 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Varun Deepak Gudhe (2024). North Carolina Social and Human Services Dataset [Dataset]. https://www.kaggle.com/datasets/varundeepakgudhe/north-carolina-social-and-human-services-dataset
Explore at:
zip(1460450 bytes)Available download formats
Dataset updated
May 3, 2024
Authors
Varun Deepak Gudhe
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
North Carolina
Description
This dataset encompasses comprehensive social and human services data for North Carolina, offering insights into public assistance, child services, vocational rehabilitation, and transfer payments across state and county levels. Each entry delineates specific services within various geographical areas, classified by type, for each year recorded. This rich dataset enables a deep dive into the trends and distributions of social services, assisting in policy-making and community support initiatives.

Suggested usage

Policy Development and Evaluation:

Government agencies and policymakers can utilize the data to assess the effectiveness of current social service programs and to design new policies. Analyzing trends over time can help identify needs and allocate resources more effectively.

Academic Research:

Researchers in social sciences, public health, and economics could use the dataset to study the impact of social services on various demographics within North Carolina. This can lead to scholarly articles, studies on social welfare, and the development of new theories in social service provision.

Community Planning:

Local government planners and community organizations can use the dataset to better understand the distribution of services such as child services and vocational rehabilitation, and plan community resources accordingly.

Grant Writing and Funding Applications:

Non-profit organizations can use detailed data to justify the need for funding in grant applications. By showing specific needs within communities, they can target their proposals to address gaps in services.

Public Awareness and Advocacy:

Advocacy groups can use the data to raise public awareness about the state of social services in North Carolina. This can drive campaigns for enhanced funding or changes in how services are delivered.

Economic Analysis:

Economists could explore the dataset to correlate the investment in social services with economic outcomes like employment rates, economic mobility, and community health indicators.
Z
Dataset of knee joint contact force peaks and corresponding subject...
data.niaid.nih.gov
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lavikainen, Jere Joonatan; Stenroth, Lauri (2023). Dataset of knee joint contact force peaks and corresponding subject characteristics from 4 open datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7253457
Explore at:
Dataset updated
Oct 9, 2023
Dataset provided by
University of Eastern Finland
Authors
Lavikainen, Jere Joonatan; Stenroth, Lauri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data from overground walking trials of 166 subjects with several trials per subject (approximately 2900 trials total).

DATA ORIGINS & LICENSE INFORMATION

The data comes from four existing open datasets collected by others:

Schreiber & Moissenet, A multimodal dataset of human gait at different walking speeds established on injury-free adult participants

article: https://www.nature.com/articles/s41597-019-0124-4

dataset: https://figshare.com/articles/dataset/A_multimodal_dataset_of_human_gait_at_different_walking_speeds/7734767

Fukuchi et al., A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals

article: https://peerj.com/articles/4640/

dataset: https://figshare.com/articles/dataset/A_public_data_set_of_overground_and_treadmill_walking_kinematics_and_kinetics_of_healthy_individuals/5722711

Horst et al., A public dataset of overground walking kinetics and full-body kinematics in healthy adult individuals

article: https://www.nature.com/articles/s41598-019-38748-8

dataset: https://data.mendeley.com/datasets/svx74xcrjr/3

Camargo et al., A comprehensive, open-source dataset of lower limb biomechanics in multiple conditions of stairs, ramps, and level-ground ambulation and transitions

article: https://www.sciencedirect.com/science/article/pii/S0021929021001007

dataset (3 links): https://data.mendeley.com/datasets/fcgm3chfff/1 https://data.mendeley.com/datasets/k9kvm5tn3f/1 https://data.mendeley.com/datasets/jj3r5f9pnf/1

In this dataset, those datasets are referred to as the Schreiber, Fukuchi, Horst, and Camargo datasets, respectively. The Schreiber, Fukuchi, Horst, and Camargo datasets are licensed under the CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/).

We have modified the datasets by analyzing the data with musculoskeletal simulations & analysis software (OpenSim). In this dataset, we publish modified data as well as some of the original data.

STRUCTURE OF THE DATASET The dataset contains two kinds of text files: those starting with "predictors_" and those starting with "response_".

Predictors comprise 12 text files, each describing the input (predictor) variables we used to train artifical neural networks to predict knee joint loading peaks. Responses similarly comprise 12 text files, each describing the response (outcome) variables that we trained and evaluated the network on. The file names are of the form "predictors_X" for predictors and "response_X" for responses, where X describes which response (outcome) variable is predicted with them. X can be: - loading_response_both: the maximum of the first peak of stance for the sum of the loading of the medial and lateral compartments - loading_response_lateral: the maximum of the first peak of stance for the loading of the lateral compartment - loading_response_medial: the maximum of the first peak of stance for the loading of the medial compartment - terminal_extension_both: the maximum of the second peak of stance for the sum of the loading of the medial and lateral compartments - terminal_extension_lateral: the maximum of the second peak of stance for the loading of the lateral compartment - terminal_extension_medial: the maximum of the second peak of stance for the loading of the medial compartment - max_peak_both: the maximum of the entire stance phase for the sum of the loading of the medial and lateral compartments - max_peak_lateral: the maximum of the entire stance phase for the loading of the lateral compartment - max_peak_medial: the maximum of the entire stance phase for the loading of the medial compartment - MFR_common: the medial force ratio for the entire stance phase - MFR_LR: the medial force ratio for the first peak of stance - MFR_TE: the medial force ratio for the second peak of stance

The predictor text files are organized as comma-separated values. Each row corresponds to one walking trial. A single subject typically has several trials. The column labels are DATASET_INDEX,SUBJECT_INDEX,KNEE_ADDUCTION,MASS,HEIGHT,BMI,WALKING_SPEED,HEEL_STRIKE_VELOCITY,AGE,GENDER.

DATASET_INDEX describes which original dataset the trial is from, where {1=Schreiber, 2=Fukuchi, 3=Horst, 4=Camargo}

SUBJECT_INDEX is the index of the subject in the original dataset. If you use this column, you will have to rewrite these to avoid duplicates (e.g., several datasets probably have subject "3").

KNEE_ADDUCTION is the knee adduction-abduction angle (positive for adduction, negative for abduction) of the subject in static pose, estimated from motion capture markers.

MASS is the mass of the subject in kilograms

HEIGHT is the height of the subject in millimeters

BMI is the body mass index of the subject

WALKING_SPEED is the mean walking speed of the subject during the trial

HEEL_STRIKE_VELOCITY is the mean of the velocities of the subject's pelvis markers at the instant of heel strike

AGE is the age of the subject in years

GENDER is an integer/boolean where {1=male, 0=female}

The response text files contain one floating-point value per row, describing the knee joint contact force peak for the trial in newtons (or the medial force ratio). Each row corresponds to one walking trial. The rows in predictor and response text files match each other (e.g., row 7 describes the same trial in both predictors_max_peak_medial.txt and response_max_peak_medial.txt).

See our journal article "Prediction of Knee Joint Compartmental Loading Maxima Utilizing Simple Subject Characteristics and Neural Networks" (https://doi.org/10.1007/s10439-023-03278-y) for more information.

Questions & other contacts: jere.lavikainen@uef.fi
m
Composed Encrypted Malicious Traffic Dataset for machine learning based...
data.mendeley.com
Updated Oct 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihao Wang (2021). Composed Encrypted Malicious Traffic Dataset for machine learning based encrypted malicious traffic analysis. [Dataset]. http://doi.org/10.17632/ztyk4h3v6s.2
Explore at:
Unique identifier
https://doi.org/10.17632/ztyk4h3v6s.2
Dataset updated
Oct 12, 2021
Authors
Zihao Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.

Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.
c
Fox News dataset is for analyzing media trends and narratives
crawlfeeds.com
csv, zip
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

Key Features of the Fox News Dataset

Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.

Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.

Format: Provided in CSV format for seamless integration into analytical and research tools.

Why Use This Dataset?

This large dataset is ideal for:

Text Classification: Develop machine learning models to classify and categorize news content.

Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.

Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.

Trend Analysis: Identify shifts in public discourse and media focus over time.

Explore More News Datasets

Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
BBC Datasets
brightdata.com
.json, .csv, .xlsx
Updated Feb 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). BBC Datasets [Dataset]. https://brightdata.com/products/datasets/bbc
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Feb 6, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock the full potential of BBC broadcast data with our comprehensive dataset featuring transcripts, program schedules, headlines, topics, and multimedia resources. This all-in-one dataset is designed to empower media analysts, researchers, journalists, and advocacy groups with actionable insights for media analysis, transparency studies, and editorial assessments.

Dataset Features

Transcripts: Access detailed broadcast transcripts, including headlines, content, author details, and publication dates. Perfect for analyzing media framing, topic frequency, and news narratives across various programs. Program Schedules: Explore program schedules with accurate timing, show names, and related metadata to track news coverage patterns and identify trends. Topics and Keywords: Analyze categorized topics and keywords to understand content diversity, editorial focus, and recurring themes in news broadcasts. Multimedia Content: Gain access to videos, images, and related articles linked to each broadcast for a holistic understanding of the news presentation. Metadata: Includes critical data points like publication dates, last updates, content URLs, and unique IDs for easier referencing and cross-analysis.

Customizable Subsets for Specific Needs Our CNN dataset is fully customizable to match your research or analytical goals. Focus on transcripts for in-depth media framing analysis, extract multimedia for content visualization studies, or dive into program schedules for broadcast trend analysis. Tailor the dataset to ensure it aligns with your objectives for maximum efficiency and relevance.

Popular Use Cases

Media Analysis: Evaluate news framing, content diversity, and topic coverage to assess editorial direction and media focus. Transparency Studies: Analyze journalistic standards, corrections, and retractions to assess media integrity and accountability. Audience Engagement: Identify recurring topics and trends in news content to understand audience preferences and behavior. Market Analysis: Track media coverage of key industries, companies, and topics to analyze public sentiment and industry relevance. Journalistic Integrity: Use transcripts and metadata to evaluate adherence to reporting practices, fairness, and transparency in news coverage. Research and Scholarly Studies: Leverage transcripts and multimedia to support academic studies in journalism, media criticism, and political discourse analysis.

Whether you are evaluating transparency, conducting media criticism, or tracking broadcast trends, our BBC dataset provides you with the tools and insights needed for in-depth research and strategic analysis. Customize your access to focus on the most relevant data points for your unique needs.
Company Financial Data | Private & Public Companies | Verified Profiles &...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai, Company Financial Data | Private & Public Companies | Verified Profiles & Contact Data | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/b2b-contact-data-premium-us-contact-data-us-b2b-contact-d-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset provided by
Area covered
Iceland, Georgia, Suriname, Korea (Democratic People's Republic of), Antigua and Barbuda, United Kingdom, Togo, Montserrat, Dominican Republic, Guam
Description
Success.ai offers a cutting-edge solution for businesses and organizations seeking Company Financial Data on private and public companies. Our comprehensive database is meticulously crafted to provide verified profiles, including contact details for financial decision-makers such as CFOs, financial analysts, corporate treasurers, and other key stakeholders. This robust dataset is continuously updated and validated using AI technology to ensure accuracy and relevance, empowering businesses to make informed decisions and optimize their financial strategies.

Key Features of Success.ai's Company Financial Data:

Global Coverage: Access data from over 70 million businesses worldwide, including public and private companies across all major industries and regions. Our datasets span 250+ countries, offering extensive reach for your financial analysis and market research.

Detailed Financial Profiles: Gain insights into company financials, including revenue, profit margins, funding rounds, and operational costs. Profiles are enriched with key contact details, including work emails, phone numbers, and physical addresses, ensuring direct access to decision-makers.

Industry-Specific Data: Tailored datasets for sectors such as financial services, manufacturing, technology, healthcare, and energy, among others. Each dataset is customized to meet the unique needs of industry professionals and analysts.

Real-Time Accuracy: With continuous updates powered by AI-driven validation, our financial data maintains a 99% accuracy rate, ensuring you have access to the most reliable and up-to-date information available.

Compliance and Security: All data is collected and processed in strict adherence to global compliance standards, including GDPR, ensuring ethical and lawful usage.

Why Choose Success.ai for Company Financial Data?

Best Price Guarantee: We pride ourselves on offering the most competitive pricing in the industry, ensuring you receive unparalleled value for comprehensive financial data.

AI-Validated Accuracy: Our advanced AI algorithms meticulously verify every data point to ensure precision and reliability, helping you avoid costly errors in your financial decision-making.

Customized Data Solutions: Whether you need data for a specific region, industry, or type of business, we tailor our datasets to align perfectly with your requirements.

Scalable Data Access: From small startups to global enterprises, our platform caters to businesses of all sizes, delivering scalable solutions to suit your operational needs.

Comprehensive Use Cases for Financial Data:

Strategic Financial Planning:

Leverage our detailed financial profiles to create accurate budgets, forecasts, and strategic plans. Gain insights into competitors’ financial health and market positions to make data-driven decisions.

Mergers and Acquisitions (M&A):

Access key financial details and contact information to streamline your M&A processes. Identify potential acquisition targets or partners with verified profiles and financial data.

Investment Analysis:

Evaluate the financial performance of public and private companies for informed investment decisions. Use our data to identify growth opportunities and assess risk factors.

Lead Generation and Sales:

Enhance your sales outreach by targeting CFOs, financial analysts, and other decision-makers with verified contact details. Utilize accurate email and phone data to increase conversion rates.

Market Research:

Understand market trends and financial benchmarks with our industry-specific datasets. Use the data for competitive analysis, benchmarking, and identifying market gaps.

APIs to Power Your Financial Strategies:

Enrichment API: Integrate real-time updates into your systems with our Enrichment API. Keep your financial data accurate and current to drive dynamic decision-making and maintain a competitive edge.

Lead Generation API: Supercharge your lead generation efforts with access to verified contact details for key financial decision-makers. Perfect for personalized outreach and targeted campaigns.

Tailored Solutions for Industry Professionals:

Financial Services Firms: Gain detailed insights into revenue streams, funding rounds, and operational costs for competitor analysis and client acquisition.

Corporate Finance Teams: Enhance decision-making with precise data on industry trends and benchmarks.

Consulting Firms: Deliver informed recommendations to clients with access to detailed financial datasets and key stakeholder profiles.

Investment Firms: Identify potential investment opportunities with verified data on financial performance and market positioning.

What Sets Success.ai Apart?

Extensive Database: Access detailed financial data for 70M+ companies worldwide, including small businesses, startups, and large corporations.

Ethical Practices: Our data collection and processing methods are fully comp...
SWAMP Data Dashboard
data.cnra.ca.gov
data.ca.gov
+2more
csv, pdf
Updated Nov 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State Water Resources Control Board (2025). SWAMP Data Dashboard [Dataset]. https://data.cnra.ca.gov/dataset/swamp-data-dashboard
Explore at:
csv, pdfAvailable download formats
Dataset updated
Nov 17, 2025
Dataset authored and provided by
California State Water Resources Control Board
Description
This dataset supports the SWAMP Data Dashboard, a public-facing tool developed by the Surface Water Ambient Monitoring Program (SWAMP) to provide accessible, user-friendly access to water quality monitoring data across California. The dashboard and its associated datasets are designed to help the public, researchers, and decision-makers explore and download monitoring data collected from California’s surface waters.

This dataset includes five distinct resources:

SWAMP Stations – Geospatial and descriptive information about SWAMP monitoring sites.

Water Quality Results – Field and lab analysis results for chemical and physical parameters measured in water samples.

Toxicity Summary Results – Summarized results from aquatic toxicity tests. Summary records are entries in the database that summarize the results from multiple replicate toxicity tests of the same sample water.

Habitat Results – Data on physical habitat conditions typically collected alongside biological monitoring to provide context for interpreting water quality conditions. Includes scores for the California Stream Condition Index (CSCI) and Algal Stream Condition Index (ASCI).

Tissue Summary Results – Annual summary statistics of contaminant concentrations in aquatic organism tissue samples. The data are derived from raw individual and composite tissue sample results.

These data are collected by SWAMP and its partners to support water quality assessments, identify trends, and inform water resource management. The SWAMP Data Dashboard provides interactive visualizations and filtering tools to explore this data by region, parameter, and more.

The SWAMP dataset is sourced from the California Environmental Data Exchange Network (CEDEN), which serves as the central repository for water quality data collected by various monitoring programs throughout the state. As such, there is some overlap between this dataset and the broader CEDEN datasets also published on the California Open Data Portal (see Related Resources). This SWAMP dataset represents a curated subset of CEDEN data, specifically tailored for use in the SWAMP Data Dashboard.

Access the SWAMP Data Dashboard: https://gispublic.waterboards.ca.gov/swamp-data/

*This dataset is provisional and subject to revision. It should not be used for regulatory purposes.
O
COVID-19 Weekly Lab Testing Public
data.sanantonio.gov
cosacovid-cosagis.hub.arcgis.com
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
COVID-19 (2023). COVID-19 Weekly Lab Testing Public [Dataset]. https://data.sanantonio.gov/dataset/covid-19-weekly-lab-testing-public
Explore at:
gpkg, zip, xlsx, txt, arcgis geoservices rest api, geojson, html, csv, kml, gdbAvailable download formats
Dataset updated
May 9, 2023
Dataset provided by
City of San Antonio
Authors
COVID-19
Description
TO DOWNLOAD THE DATASET, CLICK ON THE "Download" BUTTON

Weekly COVID-19 lab testing of San Antonio residents. Provided by San Antonio Metropolitan Health District.

This data reflects information provided by the City of San Antonio Metro Health Department. Table is updated every Monday as of data closed out as of the previous Friday/Weekend. Tests are both molecular (PCR/NAAT) and antigen (FIA) tests, and represent tests on those in Bexar County only.
Dataset for "Public health insurance coverage in India before and after...
figshare.com
bin
Updated Aug 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjay K Mohanty; Ashish Kumar Upadhyay; Suraj Maiti; Radhe Shyam Mishra; Fabrice Kämpfen; Jürgen Maurer; Owen O'Donell (2023). Dataset for "Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey data" [Dataset]. http://doi.org/10.6084/m9.figshare.23919078.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23919078.v1
Dataset updated
Aug 10, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Sanjay K Mohanty; Ashish Kumar Upadhyay; Suraj Maiti; Radhe Shyam Mishra; Fabrice Kämpfen; Jürgen Maurer; Owen O'Donell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey dataThe National Family Health Survey (NFHS), India data is publicly available data set and can be accessed on request. It can be downloaded upon registration from the Demographic and Health Survey (DHS) website upon registration at The DHS Program - Request Access To Datasets. We have used data from the fourth and fifth round of NFHS, which can be accessed after registration from the link given here for NFHS 4 and NFHS 5 https://dhsprogram.com/data/dataset/India_Standard-DHS_2015.cfm?flag=0 and here https://dhsprogram.com/data/dataset/India_Standard-DHS_2020.cfm?flag=0 respectively. These datasets (HR file) have been used to obtain this combined dataset of a paper entitled "Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey data" submitted to BMJ Global Health August 2023.
f
Table3_MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional...
frontiersin.figshare.com
figshare.com
xlsx
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiao Li; Jie Ma; Ling Leng; Mingfei Han; Mansheng Li; Fuchu He; Yunping Zhu (2023). Table3_MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis.XLSX [Dataset]. http://doi.org/10.3389/fgene.2022.806842.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.806842.s003
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Xiao Li; Jie Ma; Ling Leng; Mingfei Han; Mansheng Li; Fuchu He; Yunping Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In light of the rapid accumulation of large-scale omics datasets, numerous studies have attempted to characterize the molecular and clinical features of cancers from a multi-omics perspective. However, there are great challenges in integrating multi-omics using machine learning methods for cancer subtype classification. In this study, MoGCN, a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively. Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. In the analysis of multi-dimensional omics data of the BRCA samples in TCGA, MoGCN achieved the highest accuracy in cancer subtype classification compared with several popular algorithms. Moreover, MoGCN can extract the most significant features of each omics layer and provide candidate functional molecules for further analysis of their biological effects. And network visualization showed that MoGCN could make clinically intuitive diagnosis. The generality of MoGCN was proven on the TCGA pan-kidney cancer datasets. MoGCN and datasets are public available at https://github.com/Lifoof/MoGCN. Our study shows that MoGCN performs well for heterogeneous data integration and the interpretability of classification results, which confers great potential for applications in biomarker identification and clinical diagnosis.
g
Photovoltaic Data Acquisition (PVDAQ) Public Datasets | gimi9.com
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Photovoltaic Data Acquisition (PVDAQ) Public Datasets | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_photovoltaic-data-acquisition-pvdaq-public-datasets/
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The NREL PVDAQ is a large-scale time-series database containing system metadata and performance data from a variety of experimental PV sites and commercial public PV sites. The datasets are used to perform on-going performance and degradation analysis. Some of the sets can exhibit common elements that effect PV performance (e.g. soiling). The dataset consists of a series of files devoted to each of the systems and an associated set of metadata information that explains details about the system hardware and the site geo-location. Some system datasets also include environmental sensors that cover irradiance, temperatures, wind speeds, and precipitation at the site.
Dallas Crime Data
kaggle.com
zip
Updated Mar 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sai Ganesh (2025). Dallas Crime Data [Dataset]. https://www.kaggle.com/datasets/saiganeshvoodi/dallas-crime-data/code
Explore at:
zip(73465479 bytes)Available download formats
Dataset updated
Mar 29, 2025
Authors
Sai Ganesh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Dallas
Description
This dataset is a cleaned version of the Dallas Police Department’s public crime data, sourced from the Dallas Police Crime Analytics Dashboard. It contains detailed information about crime incidents in Dallas from 2022 to January 2025. The data represents RMS (Records Management System) Incidents reported by the Dallas Police Department, reflecting crimes as reported to law enforcement authorities.

The dataset includes a range of crime classifications and related incident details based on preliminary information provided by the reporting parties.
c
Bulk Bookstore dataset
crawlfeeds.com
csv, zip
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Bulk Bookstore dataset [Dataset]. https://crawlfeeds.com/datasets/bulk-bookstore-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Apr 27, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Bulk Bookstore is online book store. Crawl feeds teams extracted few sample records for analysis purposes. Last crawled on 27 Nov 2021.
d
Job Postings Dataset for Labour Market Research and Insights
datarade.ai
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 20, 2023
Dataset authored and provided by
Oxylabs
Area covered
Kyrgyzstan, Togo, Anguilla, Switzerland, Luxembourg, Zambia, Tajikistan, Jamaica, British Indian Ocean Territory, Sierra Leone
Description
Introducing Job Posting Datasets: Uncover labor market insights!

Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

Job Posting Datasets Source:

Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

StackShare: Access StackShare datasets to make data-driven technology decisions.

Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

Choose your preferred dataset delivery options for convenience:

Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

Why Choose Oxylabs Job Posting Datasets:

Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.
Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath
catalog.data.gov
datasets.ai
+1more
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath [Dataset]. https://catalog.data.gov/dataset/dataset-for-targeted-gc-ms-analysis-of-firefighters-exhaled-breath
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This dataset includes a table of the VOC concentrations detected in firefighter breath samples. QQ-plots for benzene, toluene, and ethylbenzene levels in breath samples as well as box-and-whisker plots of pre-, post-, and 1 h post-exposure breath levels of VOCs for firefighters participating in attack, search, and outside ventilation positions are provided. Graphs detailing the responses of individuals to pre-, post-, and 1 h post-exposure concentrations of benzene, toluene, and ethylbenzene are shown. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. Format: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. This dataset is associated with the following publication: Wallace, A., J. Pleil, K. Oliver, D. Whitaker, S. Mentese, K. Fent, and G. Horn. Targeted GC-MS analysis of firefighters’ exhaled breath: Exploring biomarker response at the individual level. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE. Taylor & Francis, Inc., Philadelphia, PA, USA, 16(5): 355-366, (2019).

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Public Dataset Access and Usage [Dataset]. https://data.sfgov.org/City-Infrastructure/Public-Dataset-Access-and-Usage/su99-qvi4

Public Dataset Access and Usage

Explore at:

xml, csv, xlsxAvailable download formats

Dataset updated

Dec 2, 2025

Description

A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).

B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.

C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.

D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.

Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.

Clear search

Close search

Google apps

Main menu

Public Dataset Access and Usage

About COVID-19 Public Datasets

mirrorCheck results for 4 public datasets

Ice cream sales analysis - temperature and weather

COVID-19 Combined Data-set with Improved Measurement Errors

North Carolina Social and Human Services Dataset

Suggested usage

Dataset of knee joint contact force peaks and corresponding subject...

Composed Encrypted Malicious Traffic Dataset for machine learning based...

Fox News dataset is for analyzing media trends and narratives

Key Features of the Fox News Dataset

Why Use This Dataset?

Explore More News Datasets

BBC Datasets

Company Financial Data | Private & Public Companies | Verified Profiles &...

SWAMP Data Dashboard

COVID-19 Weekly Lab Testing Public

Dataset for "Public health insurance coverage in India before and after...

Table3_MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional...

Photovoltaic Data Acquisition (PVDAQ) Public Datasets | gimi9.com

Dallas Crime Data

Bulk Bookstore dataset

Job Postings Dataset for Labour Market Research and Insights

Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath

Public Dataset Access and Usage