Facebook
TwitterA. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).
B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.
C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.
D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.
Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.
Facebook
TwitterIn an effort to help combat COVID-19, we created a COVID-19 Public Datasets program to make data more accessible to researchers, data scientists and analysts. The program will host a repository of public datasets that relate to the COVID-19 crisis and make them free to access and analyze. These include datasets from the New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Free hosting and queries of COVID datasets As with all data in the Google Cloud Public Datasets Program , Google pays for storage of datasets in the program. BigQuery also provides free queries over certain COVID-related datasets to support the response to COVID-19. Queries on COVID datasets will not count against the BigQuery sandbox free tier , where you can query up to 1TB free each month. Limitations and duration Queries of COVID data are free. If, during your analysis, you join COVID datasets with non-COVID datasets, the bytes processed in the non-COVID datasets will be counted against the free tier, then charged accordingly, to prevent abuse. Queries of COVID datasets will remain free until Sept 15, 2021. The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices & policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies. See the list of all datasets included in the program
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each zipped folder contains results files from reanalysis of public data in our publication, "mirrorCheck: an R package facilitating informed use of DESeq2’s lfcShrink() function for differential gene expression analysis of clinical samples" (see also the Collection description).
These files were produced by rendering the Quarto documents provided in the supplementary data with the publication (one per dataset). The Quarto codes for the 3 main analyses (COVID, BRCA and Cell line datasets) performed differential gene expression (DGE) analysis using both DESeq2 with lfcShrink() via our R package mirrorCheck, and also edgeR. Each zipped folder here contains 2 folders, one for each DGE analysis. Since DESeq2 was run on data without prior data cleaning, with prefiltering or after Surrogate Variable Analysis, the 'mirrorCheck output' folders themselves contain 3 sub-folders titled 'DESeq_noclean', 'DESeq_prefilt' and 'DESeq_sva". The COVID dataset also has a folder with results from Gene Set Enrichment Analysis. Finally, the fourth folder contains results from a tutorial/vignette-style supplementary file using the Bioconductor "parathyroidSE" dataset. This analysis only utilised DESeq2, with both data cleaning methods and testing two different design formulae, resulting in 5 sub-folders in the zipped folder.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Look at the table given in the spreadsheet, and see if there's a correlation between temperature and revenue with respect to weather in ice cream sales. Eventually, I did see the pattern: higher temperatures usually meant more revenue, which seems realistic, and if it rains the ice cream sales decrease drastically. However, I wanted to dig further into the data and perform a deeper analysis using a visualization, with the help of regression analysis
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset encompasses comprehensive social and human services data for North Carolina, offering insights into public assistance, child services, vocational rehabilitation, and transfer payments across state and county levels. Each entry delineates specific services within various geographical areas, classified by type, for each year recorded. This rich dataset enables a deep dive into the trends and distributions of social services, assisting in policy-making and community support initiatives.
Policy Development and Evaluation:
Academic Research:
Community Planning:
Grant Writing and Funding Applications:
Public Awareness and Advocacy:
Economic Analysis:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data from overground walking trials of 166 subjects with several trials per subject (approximately 2900 trials total).
DATA ORIGINS & LICENSE INFORMATION
The data comes from four existing open datasets collected by others:
Schreiber & Moissenet, A multimodal dataset of human gait at different walking speeds established on injury-free adult participants
article: https://www.nature.com/articles/s41597-019-0124-4
Fukuchi et al., A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals
article: https://peerj.com/articles/4640/
Horst et al., A public dataset of overground walking kinetics and full-body kinematics in healthy adult individuals
article: https://www.nature.com/articles/s41598-019-38748-8
dataset: https://data.mendeley.com/datasets/svx74xcrjr/3
Camargo et al., A comprehensive, open-source dataset of lower limb biomechanics in multiple conditions of stairs, ramps, and level-ground ambulation and transitions
article: https://www.sciencedirect.com/science/article/pii/S0021929021001007
dataset (3 links): https://data.mendeley.com/datasets/fcgm3chfff/1 https://data.mendeley.com/datasets/k9kvm5tn3f/1 https://data.mendeley.com/datasets/jj3r5f9pnf/1
In this dataset, those datasets are referred to as the Schreiber, Fukuchi, Horst, and Camargo datasets, respectively. The Schreiber, Fukuchi, Horst, and Camargo datasets are licensed under the CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/).
We have modified the datasets by analyzing the data with musculoskeletal simulations & analysis software (OpenSim). In this dataset, we publish modified data as well as some of the original data.
STRUCTURE OF THE DATASET The dataset contains two kinds of text files: those starting with "predictors_" and those starting with "response_".
Predictors comprise 12 text files, each describing the input (predictor) variables we used to train artifical neural networks to predict knee joint loading peaks. Responses similarly comprise 12 text files, each describing the response (outcome) variables that we trained and evaluated the network on. The file names are of the form "predictors_X" for predictors and "response_X" for responses, where X describes which response (outcome) variable is predicted with them. X can be: - loading_response_both: the maximum of the first peak of stance for the sum of the loading of the medial and lateral compartments - loading_response_lateral: the maximum of the first peak of stance for the loading of the lateral compartment - loading_response_medial: the maximum of the first peak of stance for the loading of the medial compartment - terminal_extension_both: the maximum of the second peak of stance for the sum of the loading of the medial and lateral compartments - terminal_extension_lateral: the maximum of the second peak of stance for the loading of the lateral compartment - terminal_extension_medial: the maximum of the second peak of stance for the loading of the medial compartment - max_peak_both: the maximum of the entire stance phase for the sum of the loading of the medial and lateral compartments - max_peak_lateral: the maximum of the entire stance phase for the loading of the lateral compartment - max_peak_medial: the maximum of the entire stance phase for the loading of the medial compartment - MFR_common: the medial force ratio for the entire stance phase - MFR_LR: the medial force ratio for the first peak of stance - MFR_TE: the medial force ratio for the second peak of stance
The predictor text files are organized as comma-separated values. Each row corresponds to one walking trial. A single subject typically has several trials. The column labels are DATASET_INDEX,SUBJECT_INDEX,KNEE_ADDUCTION,MASS,HEIGHT,BMI,WALKING_SPEED,HEEL_STRIKE_VELOCITY,AGE,GENDER.
DATASET_INDEX describes which original dataset the trial is from, where {1=Schreiber, 2=Fukuchi, 3=Horst, 4=Camargo}
SUBJECT_INDEX is the index of the subject in the original dataset. If you use this column, you will have to rewrite these to avoid duplicates (e.g., several datasets probably have subject "3").
KNEE_ADDUCTION is the knee adduction-abduction angle (positive for adduction, negative for abduction) of the subject in static pose, estimated from motion capture markers.
MASS is the mass of the subject in kilograms
HEIGHT is the height of the subject in millimeters
BMI is the body mass index of the subject
WALKING_SPEED is the mean walking speed of the subject during the trial
HEEL_STRIKE_VELOCITY is the mean of the velocities of the subject's pelvis markers at the instant of heel strike
AGE is the age of the subject in years
GENDER is an integer/boolean where {1=male, 0=female}
The response text files contain one floating-point value per row, describing the knee joint contact force peak for the trial in newtons (or the medial force ratio). Each row corresponds to one walking trial. The rows in predictor and response text files match each other (e.g., row 7 describes the same trial in both predictors_max_peak_medial.txt and response_max_peak_medial.txt).
See our journal article "Prediction of Knee Joint Compartmental Loading Maxima Utilizing Simple Subject Characteristics and Neural Networks" (https://doi.org/10.1007/s10439-023-03278-y) for more information.
Questions & other contacts: jere.lavikainen@uef.fi
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.
Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.
This large dataset is ideal for:
Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.
The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of BBC broadcast data with our comprehensive dataset featuring transcripts, program schedules, headlines, topics, and multimedia resources. This all-in-one dataset is designed to empower media analysts, researchers, journalists, and advocacy groups with actionable insights for media analysis, transparency studies, and editorial assessments.
Dataset Features
Transcripts: Access detailed broadcast transcripts, including headlines, content, author details, and publication dates. Perfect for analyzing media framing, topic frequency, and news narratives across various programs. Program Schedules: Explore program schedules with accurate timing, show names, and related metadata to track news coverage patterns and identify trends. Topics and Keywords: Analyze categorized topics and keywords to understand content diversity, editorial focus, and recurring themes in news broadcasts. Multimedia Content: Gain access to videos, images, and related articles linked to each broadcast for a holistic understanding of the news presentation. Metadata: Includes critical data points like publication dates, last updates, content URLs, and unique IDs for easier referencing and cross-analysis.
Customizable Subsets for Specific Needs Our CNN dataset is fully customizable to match your research or analytical goals. Focus on transcripts for in-depth media framing analysis, extract multimedia for content visualization studies, or dive into program schedules for broadcast trend analysis. Tailor the dataset to ensure it aligns with your objectives for maximum efficiency and relevance.
Popular Use Cases
Media Analysis: Evaluate news framing, content diversity, and topic coverage to assess editorial direction and media focus. Transparency Studies: Analyze journalistic standards, corrections, and retractions to assess media integrity and accountability. Audience Engagement: Identify recurring topics and trends in news content to understand audience preferences and behavior. Market Analysis: Track media coverage of key industries, companies, and topics to analyze public sentiment and industry relevance. Journalistic Integrity: Use transcripts and metadata to evaluate adherence to reporting practices, fairness, and transparency in news coverage. Research and Scholarly Studies: Leverage transcripts and multimedia to support academic studies in journalism, media criticism, and political discourse analysis.
Whether you are evaluating transparency, conducting media criticism, or tracking broadcast trends, our BBC dataset provides you with the tools and insights needed for in-depth research and strategic analysis. Customize your access to focus on the most relevant data points for your unique needs.
Facebook
TwitterSuccess.ai offers a cutting-edge solution for businesses and organizations seeking Company Financial Data on private and public companies. Our comprehensive database is meticulously crafted to provide verified profiles, including contact details for financial decision-makers such as CFOs, financial analysts, corporate treasurers, and other key stakeholders. This robust dataset is continuously updated and validated using AI technology to ensure accuracy and relevance, empowering businesses to make informed decisions and optimize their financial strategies.
Key Features of Success.ai's Company Financial Data:
Global Coverage: Access data from over 70 million businesses worldwide, including public and private companies across all major industries and regions. Our datasets span 250+ countries, offering extensive reach for your financial analysis and market research.
Detailed Financial Profiles: Gain insights into company financials, including revenue, profit margins, funding rounds, and operational costs. Profiles are enriched with key contact details, including work emails, phone numbers, and physical addresses, ensuring direct access to decision-makers.
Industry-Specific Data: Tailored datasets for sectors such as financial services, manufacturing, technology, healthcare, and energy, among others. Each dataset is customized to meet the unique needs of industry professionals and analysts.
Real-Time Accuracy: With continuous updates powered by AI-driven validation, our financial data maintains a 99% accuracy rate, ensuring you have access to the most reliable and up-to-date information available.
Compliance and Security: All data is collected and processed in strict adherence to global compliance standards, including GDPR, ensuring ethical and lawful usage.
Why Choose Success.ai for Company Financial Data?
Best Price Guarantee: We pride ourselves on offering the most competitive pricing in the industry, ensuring you receive unparalleled value for comprehensive financial data.
AI-Validated Accuracy: Our advanced AI algorithms meticulously verify every data point to ensure precision and reliability, helping you avoid costly errors in your financial decision-making.
Customized Data Solutions: Whether you need data for a specific region, industry, or type of business, we tailor our datasets to align perfectly with your requirements.
Scalable Data Access: From small startups to global enterprises, our platform caters to businesses of all sizes, delivering scalable solutions to suit your operational needs.
Comprehensive Use Cases for Financial Data:
Leverage our detailed financial profiles to create accurate budgets, forecasts, and strategic plans. Gain insights into competitors’ financial health and market positions to make data-driven decisions.
Access key financial details and contact information to streamline your M&A processes. Identify potential acquisition targets or partners with verified profiles and financial data.
Evaluate the financial performance of public and private companies for informed investment decisions. Use our data to identify growth opportunities and assess risk factors.
Enhance your sales outreach by targeting CFOs, financial analysts, and other decision-makers with verified contact details. Utilize accurate email and phone data to increase conversion rates.
Understand market trends and financial benchmarks with our industry-specific datasets. Use the data for competitive analysis, benchmarking, and identifying market gaps.
APIs to Power Your Financial Strategies:
Enrichment API: Integrate real-time updates into your systems with our Enrichment API. Keep your financial data accurate and current to drive dynamic decision-making and maintain a competitive edge.
Lead Generation API: Supercharge your lead generation efforts with access to verified contact details for key financial decision-makers. Perfect for personalized outreach and targeted campaigns.
Tailored Solutions for Industry Professionals:
Financial Services Firms: Gain detailed insights into revenue streams, funding rounds, and operational costs for competitor analysis and client acquisition.
Corporate Finance Teams: Enhance decision-making with precise data on industry trends and benchmarks.
Consulting Firms: Deliver informed recommendations to clients with access to detailed financial datasets and key stakeholder profiles.
Investment Firms: Identify potential investment opportunities with verified data on financial performance and market positioning.
What Sets Success.ai Apart?
Extensive Database: Access detailed financial data for 70M+ companies worldwide, including small businesses, startups, and large corporations.
Ethical Practices: Our data collection and processing methods are fully comp...
Facebook
TwitterThis dataset supports the SWAMP Data Dashboard, a public-facing tool developed by the Surface Water Ambient Monitoring Program (SWAMP) to provide accessible, user-friendly access to water quality monitoring data across California. The dashboard and its associated datasets are designed to help the public, researchers, and decision-makers explore and download monitoring data collected from California’s surface waters.
This dataset includes five distinct resources:
These data are collected by SWAMP and its partners to support water quality assessments, identify trends, and inform water resource management. The SWAMP Data Dashboard provides interactive visualizations and filtering tools to explore this data by region, parameter, and more.
The SWAMP dataset is sourced from the California Environmental Data Exchange Network (CEDEN), which serves as the central repository for water quality data collected by various monitoring programs throughout the state. As such, there is some overlap between this dataset and the broader CEDEN datasets also published on the California Open Data Portal (see Related Resources). This SWAMP dataset represents a curated subset of CEDEN data, specifically tailored for use in the SWAMP Data Dashboard.
Access the SWAMP Data Dashboard: https://gispublic.waterboards.ca.gov/swamp-data/
*This dataset is provisional and subject to revision. It should not be used for regulatory purposes.
Facebook
TwitterTO DOWNLOAD THE DATASET, CLICK ON THE "Download" BUTTON
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey dataThe National Family Health Survey (NFHS), India data is publicly available data set and can be accessed on request. It can be downloaded upon registration from the Demographic and Health Survey (DHS) website upon registration at The DHS Program - Request Access To Datasets. We have used data from the fourth and fifth round of NFHS, which can be accessed after registration from the link given here for NFHS 4 and NFHS 5 https://dhsprogram.com/data/dataset/India_Standard-DHS_2015.cfm?flag=0 and here https://dhsprogram.com/data/dataset/India_Standard-DHS_2020.cfm?flag=0 respectively. These datasets (HR file) have been used to obtain this combined dataset of a paper entitled "Public health insurance coverage in India before and after PM-JAY: repeated cross-sectional analysis of nationally representative survey data" submitted to BMJ Global Health August 2023.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In light of the rapid accumulation of large-scale omics datasets, numerous studies have attempted to characterize the molecular and clinical features of cancers from a multi-omics perspective. However, there are great challenges in integrating multi-omics using machine learning methods for cancer subtype classification. In this study, MoGCN, a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively. Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. In the analysis of multi-dimensional omics data of the BRCA samples in TCGA, MoGCN achieved the highest accuracy in cancer subtype classification compared with several popular algorithms. Moreover, MoGCN can extract the most significant features of each omics layer and provide candidate functional molecules for further analysis of their biological effects. And network visualization showed that MoGCN could make clinically intuitive diagnosis. The generality of MoGCN was proven on the TCGA pan-kidney cancer datasets. MoGCN and datasets are public available at https://github.com/Lifoof/MoGCN. Our study shows that MoGCN performs well for heterogeneous data integration and the interpretability of classification results, which confers great potential for applications in biomarker identification and clinical diagnosis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NREL PVDAQ is a large-scale time-series database containing system metadata and performance data from a variety of experimental PV sites and commercial public PV sites. The datasets are used to perform on-going performance and degradation analysis. Some of the sets can exhibit common elements that effect PV performance (e.g. soiling). The dataset consists of a series of files devoted to each of the systems and an associated set of metadata information that explains details about the system hardware and the site geo-location. Some system datasets also include environmental sensors that cover irradiance, temperatures, wind speeds, and precipitation at the site.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a cleaned version of the Dallas Police Department’s public crime data, sourced from the Dallas Police Crime Analytics Dashboard. It contains detailed information about crime incidents in Dallas from 2022 to January 2025. The data represents RMS (Records Management System) Incidents reported by the Dallas Police Department, reflecting crimes as reported to law enforcement authorities.
The dataset includes a range of crime classifications and related incident details based on preliminary information provided by the reporting parties.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Bulk Bookstore is online book store. Crawl feeds teams extracted few sample records for analysis purposes. Last crawled on 27 Nov 2021.
Facebook
TwitterIntroducing Job Posting Datasets: Uncover labor market insights!
Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.
Job Posting Datasets Source:
Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.
Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.
StackShare: Access StackShare datasets to make data-driven technology decisions.
Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.
Choose your preferred dataset delivery options for convenience:
Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.
Why Choose Oxylabs Job Posting Datasets:
Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.
Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.
Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.
Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.
Facebook
TwitterThis dataset includes a table of the VOC concentrations detected in firefighter breath samples. QQ-plots for benzene, toluene, and ethylbenzene levels in breath samples as well as box-and-whisker plots of pre-, post-, and 1 h post-exposure breath levels of VOCs for firefighters participating in attack, search, and outside ventilation positions are provided. Graphs detailing the responses of individuals to pre-, post-, and 1 h post-exposure concentrations of benzene, toluene, and ethylbenzene are shown. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. Format: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. This dataset is associated with the following publication: Wallace, A., J. Pleil, K. Oliver, D. Whitaker, S. Mentese, K. Fent, and G. Horn. Targeted GC-MS analysis of firefighters’ exhaled breath: Exploring biomarker response at the individual level. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE. Taylor & Francis, Inc., Philadelphia, PA, USA, 16(5): 355-366, (2019).
Facebook
TwitterA. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).
B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.
C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.
D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.
Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.