Facebook
TwitterCompany Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Facebook
TwitterImaging and bio-sample data sets related to patients with active and inactive Inflammatory Bowel Disease
Facebook
TwitterFull profile of 10,000 people in the US - download here, data schema here, with more than 40 data points including - Full Name - Education - Location - Work Experience History and many more!
There are additionally 258+ Million US people profiles available, visit the LinkDB product page here.
Our LinkDB database is an exhaustive database of publicly accessible LinkedIn people and companies profiles. It contains close to 500 Million people and companies profiles globally.
Facebook
TwitterThe User Profile Data is a structured, anonymized dataset designed to help organizations understand who their users are, what devices they use, and where they are located. Each record provides privacy-compliant linkages between user IDs, demographic profiles, device intelligence, and geolocation data, offering deep context for analytics, segmentation, and personalization.
Built for privacy-safe analytics, the dataset uses hashed identifiers like phone number and email and standardized formats, making it easy to integrate into big-data platforms, AI pipelines, and machine learning models for advanced analytics.
Demographic insights include gender, age, and age group, essential for audience profiling, marketing optimization, and consumer intelligence. All gender data is user-declared and AI-verified through image-based avatar validation, ensuring data accuracy and authenticity.
The dataset’s Device Intelligence Layer includes rich technical attributes such as device brand, model, OS version, user agent, RAM, language, and timezone, enabling technical segmentation, performance analytics, and targeted ad delivery across diverse device ecosystems.
On the location and POI front, the dataset combines GPS-based and IP-based coordinates—including country, region, city, latitude, longitude —to provide high-precision geospatial insights. This enables mobility pattern analysis, market expansion planning, and POI clustering for advanced location intelligence.
Each user record contains onboarding and lifecycle fields like unique IDs, and profile update timestamps, allowing accurate tracking of user acquisition trends, data freshness, and activity duration.
🔍 Key Features • 1st-party, consent-based demographic & device data • AI-verified gender insights via avatar recognition • OS-level app data with 120+ daily sessions per user • Global coverage across APAC and emerging markets • GPS + IP-based geolocation & POI intelligence • Privacy-compliant, hashed identifiers for safe integration
🚀 Use Cases • Audience segmentation & lookalike modeling • Ad-tech and mar-tech optimization • Geospatial & POI analytics • Fraud detection & risk scoring • Personalization & recommendation engines • App performance & device compatibility insights
🏢 Industries Served Ad-Tech • Mar-Tech • FinTech • Telecom • Retail Analytics • Consumer Intelligence • AI & ML Platforms
Facebook
TwitterAlthough soil and agronomy data collection in Ethiopia has begun over 60 years ago, the data are hardly accessible as they are scattered across different organizations, mostly held in the hands of individuals (Ashenafi et al.,2020; Tamene et al.,2022), which makes them vulnerable to permanent loss. Cognizant of the problem, the Coalition of the Willing (CoW) for data sharing and access was created in 2018 with joint support and coordination of the Alliance Bioversity-CIAT and GIZ (https://www.ethioagridata.com/index.html). Mobilizing its members, the CoW has embarked on data rescue operations including data ecosystem mapping, collation, and curation of the legacy data, which was put into the central data repository for its members and the wider data user’s community according to the guideline developed based on the FAIR data principles and approved by the CoW. So far, CoW managed to collate and rescue about 20,000 legacy soil profile data and over 38,000 crop responses to fertilizer data (Tamene et al.,2022). The legacy soil profile dataset (consisting of Profiles Site = 1,776 observations with 37 variables; Profiles Layer Field = 1,493 observations with 64 variables; Profiles Layer Lab= 1,386 observations with 76 variables) is extracted, transformed, and uploaded into a harmonized template (adapted from Batjes 2022; Leenaars et al, 2014) from the below source: Bilateral Ethiopian-Netherlands Effort for Food, Income and Trade (BENEFIT) Partnership which is a portfolio of five programs (ISSD, Cascape, ENTAG, SBN, and REALISE) and is funded by the government of the Kingdom of Netherlands through its embassy in Addis Ababa. The BENEFIT-REALISE program implements its interventions in 60 PSNP weredas in four regions (Tigray, Amhara, Oromia, and SNNPR).Accordingly, in 2019, BENEFIT-REALISE along with the MoA initiated a wereda-wide soil resource characterization and mapping task at1:50,000 scale in 15 BENEFIT-REALISE intervention weredas: 3 of Tigray, 6 of Amhara, 3 of Oromia, and 3 of SNNPR. Reference: Ashenafi, A., Tamene, L., and Erkossa, T. 2020. Identifying, Cataloguing, and Mapping Soil and Agronomic Data in Ethiopia. CIAT Publication No. 506. International Center for Tropical Agriculture (CIAT). Addis Ababa, Ethiopia. 42 p. 10.13140/RG.2.2.31759.41123. Ashenafi, A., Erkossa, T., Gudeta, K., Abera, W., Mesfin, E., Mekete, T., Haile, M., Haile, W., Abegaz, A., Tafesse, D. and Belay, G., 2022. Reference Soil Groups Map of Ethiopia Based on Legacy Data and Machine Learning Technique: EthioSoilGrids 1.0. EGUsphere, pp.1-40. https://doi.org/10.5194/egusphere-2022-301 Tamene L; Erkossa T; Tafesse T; Abera W; Schultz S. 2021. A coalition of the Willing - Powering data-driven solutions for Ethiopian Agriculture. CIAT Publication No. 518. International Center for Tropical Agriculture (CIAT). Addis Ababa, Ethiopia. 34 p. https://www.ethioagridata.com/Resources/Powering%20Data-Driven%20Solutions%20for%20Ethiopian%20Agriculture.pdf. The Coalition of the Willing (CoW) website: https://www.ethioagridata.com/index.html. Batjes, N.H., 2022. Basic principles for compiling a profile dataset for consideration in WoSIS. CoP report, ISRIC–World Soil Information, Wageningen. Contents Summary, 4(1), p.3. Carvalho Ribeiro, E.D. and Batjes, N.H., 2020. World Soil Information Service (WoSIS)-Towards the standardization and harmonization of world soil data: Procedures Manual 2020. Elias, E.: Soils of the Ethiopian Highlands: Geomorphology and Properties, CASCAPE Project, 648 ALTERRA, Wageningen UR, the Netherlands, library.wur.nl/WebQuery/isric/2259099, 649 2016. Leenaars, J. G. B., van Oostrum, A.J.M., and Ruiperez ,G.M.: Africa Soil Profiles Database, Version 1.2. A compilation of georeferenced and standardised legacy soil profile data for Sub Saharan Africa (with dataset), ISRIC Report 2014/01, Africa Soil Information Service (AfSIS) project and ISRIC – World Soil Information, Wageningen, library.wur.nl/WebQuery/isric/2259472, 2014. Leenaars, J. G. B., Eyasu, E., Wösten, H., Ruiperez González, M., Kempen, B.,Ashenafi, A., and Brouwer, F.: Major soil-landscape resources of the cascape intervention woredas, Ethiopia: Soil information in support to scaling up of evidence-based best practices in agricultural production (with dataset), CASCAPE working paper series No. OT_CP_2016_1, Cascape. https://edepot.wur.nl/428596, 2016. Leenaars, J. G. B., Elias, E., Wösten, J. H. M., Ruiperez-González, M., and Kempen, B.: Mapping the major soil-landscape resources of the Ethiopian Highlands using random forest, Geoderma, 361, https://doi.org/10.1016/j.geoderma.2019.114067, 2020a. 740 . Leenaars, J. G. B., Ruiperez, M., González, M., Kempen, B., and Mantel, S.: Semi-detailed soil resource survey and mapping of REALISE woredas in Ethiopia, Project report to the BENEFIT-REALISE programme, December, ISRIC-World Soil Information, Wageningen, 2020b.
TERMS: Access to the data is limited to the CoW members until the national soil and agronomy data-sharing directive of MoA is registered by the Ministry of Justice and released for implementation. DISCLAIMER: The dataset populated in the harmonized template consisting of 76 variables is extracted, transformed, and uploaded from the source document by the CoW. Hence, if any irregularities are observed, the data users have referred to the source document uploaded along with the dataset. Use of the dataset and any consequences arising from using it is the user’s sole responsibility.
Facebook
TwitterThe following data has been taken from the Chicago Data Portal via an API. The data was cleaned and saved as a CSV file for further analysis. The data was used to test a a machine learning model and assigned SSL (Stratified Subject List) scores to each and every individual in the record.
The following dataset can be used as a pivotal tool in the domain of criminology. It contains 1000 rows and 7 columns and is particularly useful for criminal profiling as it contains data such as the locality of the person, the age of the person, the zipcode of the place they are from, the gender, race, the district where majority criminal reports have been filed and their latest district of report. This information assists in training models to predict the probability of a convicted criminal to engage in recidivism based on the locality they are from, their age and degree of crime along with various other parameters.
Facebook
TwitterThe Cybersecurity Framework Manufacturing Profile Low Security Level Example Implementations Guide provides example proof-of-concept solutions demonstrating how open-source and commercial off-the-shelf (COTS) products that are currently available today can be implemented in manufacturing environments to satisfy the requirements in the Cybersecurity Framework (CSF) Manufacturing Profile [8] Low Security Level. Example proof-of-concept solutions for a process-based manufacturing environment and a discrete-based manufacturing environment are included in the guide. Depending on factors like size, sophistication, risk tolerance, and threat landscape, manufacturers should make their own determinations about the breadth of the proof-of-concept solutions they may voluntarily implement. The dataset records the Key Performance Indicator (KPI) for the example implementation of the process-based manufacturing system use case.
Facebook
TwitterAccess B2B Contact Data for North American Small Business Owners with Success.ai—your go-to provider for verified, high-quality business datasets. This dataset is tailored for businesses, agencies, and professionals seeking direct access to decision-makers within the small business ecosystem across North America. With over 170 million professional profiles, it’s an unparalleled resource for powering your marketing, sales, and lead generation efforts.
Key Features of the Dataset:
Verified Contact Details
Includes accurate and up-to-date email addresses and phone numbers to ensure you reach your targets reliably.
AI-validated for 99% accuracy, eliminating errors and reducing wasted efforts.
Detailed Professional Insights
Comprehensive data points include job titles, skills, work experience, and education to enable precise segmentation and targeting.
Enriched with insights into decision-making roles, helping you connect directly with small business owners, CEOs, and other key stakeholders.
Business-Specific Information
Covers essential details such as industry, company size, location, and more, enabling you to tailor your campaigns effectively. Ideal for profiling and understanding the unique needs of small businesses.
Continuously Updated Data
Our dataset is maintained and updated regularly to ensure relevance and accuracy in fast-changing market conditions. New business contacts are added frequently, helping you stay ahead of the competition.
Why Choose Success.ai?
At Success.ai, we understand the critical importance of high-quality data for your business success. Here’s why our dataset stands out:
Tailored for Small Business Engagement Focused specifically on North American small business owners, this dataset is an invaluable resource for building relationships with SMEs (Small and Medium Enterprises). Whether you’re targeting startups, local businesses, or established small enterprises, our dataset has you covered.
Comprehensive Coverage Across North America Spanning the United States, Canada, and Mexico, our dataset ensures wide-reaching access to verified small business contacts in the region.
Categories Tailored to Your Needs Includes highly relevant categories such as Small Business Contact Data, CEO Contact Data, B2B Contact Data, and Email Address Data to match your marketing and sales strategies.
Customizable and Flexible Choose from a wide range of filtering options to create datasets that meet your exact specifications, including filtering by industry, company size, geographic location, and more.
Best Price Guaranteed We pride ourselves on offering the most competitive rates without compromising on quality. When you partner with Success.ai, you receive superior data at the best value.
Seamless Integration Delivered in formats that integrate effortlessly with your CRM, marketing automation, or sales platforms, so you can start acting on the data immediately.
Use Cases: This dataset empowers you to:
Drive Sales Growth: Build and refine your sales pipeline by connecting directly with decision-makers in small businesses. Optimize Marketing Campaigns: Launch highly targeted email and phone outreach campaigns with verified contact data. Expand Your Network: Leverage the dataset to build relationships with small business owners and other key figures within the B2B landscape. Improve Data Accuracy: Enhance your existing databases with verified, enriched contact information, reducing bounce rates and increasing ROI. Industries Served: Whether you're in B2B SaaS, digital marketing, consulting, or any field requiring accurate and targeted contact data, this dataset serves industries of all kinds. It is especially useful for professionals focused on:
Lead Generation Business Development Market Research Sales Outreach Customer Acquisition What’s Included in the Dataset: Each profile provides:
Full Name Verified Email Address Phone Number (where available) Job Title Company Name Industry Company Size Location Skills and Professional Experience Education Background With over 170 million profiles, you can tap into a wealth of opportunities to expand your reach and grow your business.
Why High-Quality Contact Data Matters: Accurate, verified contact data is the foundation of any successful B2B strategy. Reaching small business owners and decision-makers directly ensures your message lands where it matters most, reducing costs and improving the effectiveness of your campaigns. By choosing Success.ai, you ensure that every contact in your pipeline is a genuine opportunity.
Partner with Success.ai for Better Data, Better Results: Success.ai is committed to delivering premium-quality B2B data solutions at scale. With our small business owner dataset, you can unlock the potential of North America's dynamic small business market.
Get Started Today Request a sample or customize your dataset to fit your unique...
Facebook
TwitterA random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.
We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.
Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Get access to a premium Medium articles dataset containing 500,000+ curated articles with metadata including author profiles, publication dates, reading time, tags, claps, and more. Ideal for natural language processing (NLP), machine learning, content trend analysis, and AI model training.
Request here for the large dataset Medium datasets
Checkout sample dataset in CSV
Training language models (LLMs)
Analyzing content trends and engagement
Sentiment and text classification
SEO research and author profiling
Academic or commercial research
High-volume, cleanly structured JSON
Ideal for developers, researchers, and data scientists
Easy integration with Python, R, SQL, and other data pipelines
Affordable and ready-to-use
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. Sample Dataset for Application of Proposed Methodology (data.csv). To protect patient confidentiality, the hospitals providing the example data used in this paper have not given permission for the data to be made publicly available. We have, however, included a limited “fake” version of the dataset. This dataset contains 3 variables - dlp.over indicates whether an exam is “high dose,” sizeC is an ID indicating the combination of anatomic area examined and patient size category, while fac is an ID indicating the hospital the exam was performed in. Information on which ID values are associated with which anatomic areas, patient sizes, and hospital will not be provided, as they are not necessary for the illustration of statistical methods described in the paper. Note that since the dataset made available is different from the dataset used in the paper, the results should be expected to be comparable, but not identical. The software implementing the methods described in this article is available on request from the author.
Facebook
Twitterhttps://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Dataset Description This record is a collection of Whole-genome sequencing (WGS), RNA sequencing (RNA-seq), NanoString's nCounter® Breast Cancer 360 (BC360) Panel and cell viability assay data, generated as part of the study “Breast cancer patient-derived whole-tumor cell culture model for efficient drug profiling and treatment response prediction" by Chen et al., 2022. The WGS dataset contains raw sequencing data (BAM files) from tumor scraping cells (TSCs) at the time of surgical resection, derived whole-tumor cell (WTC) cultures from each patient's specimen, and normal skin biopsy for germline control, from five (5) breast cancer (BC) patients. Genomic DNA samples were isolated by using the QIAamp DNA mini kit (QIAGEN). The library was prepared by using Illumina TruSeq PCR-free (350 bp) according to the manufacturer’s protocol. The bulk DNA samples were then sequenced by Illumina Hiseq X and processed via the Science for Life Laboratory CAW workflow version 1.2.362 (Stockholm, Sweden; https://github.com/SciLifeLab/Sarek). The RNA-seq dataset contains raw sequencing data (fastq files) from the TSC pellets at the time of surgical resection, and the pellets of derived WTC cultures with or without tamoxifen metabolites treatment (1 nM 4OHT and 25 nM Z-Endoxifen), from 16 BC patients. 2000 ng RNA was extracted using the RNeasy mini kit (QIAGEN) from each sample, and 1 μg of total RNA was used for rRNA depletion using RiboZero (Illumina). Stranded RNA-seq libraries were constructed using TruSeq Stranded Total RNA Library Prep Kit (Illumina), and paired-end sequencing was performed on HiSeq 2500 with a 2 x 126 setup using the Science for Life Laboratory platform (Stockholm, Sweden). The NanoString's nCounter® BC360 Panel dataset contains normalized data from FFPE tissue samples of 43 BC patients. RNA was extracted from the macrodissected sections using the High Pure FFPET RNA Isolation Kit (Roche) following the manufacturer's protocols. Then, 200 ng of RNA per sample were loaded and further analyzed according to the manufacturer’s recommendation on a NanoString nCounter® system using the Breast Cancer 360 code set, which is comprised of 18 housekeeping genes and 752 target genes covering key pathways in tumor biology, microenvironment, and immune response. Raw data was assessed using several quality assurance (QA) metrics to measure imaging quality, oversaturation, and overall signal-to-noise ratio. All samples satisfying QA metric checks were background corrected (background thresholding) using the negative probes and normalized with their mean minus two standard deviations. The background-corrected data were then normalized by calculating the geometric mean of five housekeeper genes, namely ACTB, MRPL19, PSMC4, RPLP0, and SF3A1. The cell viability assay dataset for the main study contains drug sensitivity score (DSS) values for each of the tested drugs derived from the WTC spheroids of 45 BC patients. For patient DP-45, multiple regions were sampled to establish WTCs and perform drug profiling. For the neoadjuvant setting validation study, DSS values correspond to WTCs of 15 BC patients. For the drug profiling assay, each compound covered five concentrations ranging from 10 μM to 1 nM (2 μM to 0.2 nM for trastuzumab and pertuzumab) in 10-fold dilutions and was dispensed using the acoustic liquid handling system Echo 550 (Labcyte Inc) to make spotted 384-well plates. For the neoadjuvant setting validation assay, we updated the cyclophosphamide into its active metabolite form 4-hydroperoxy cyclophosphamide (4-OOH-cyclophosphamide). Each relevant compound covered eight concentrations ranging from 10 μM to 1 nM (2 μM to 0.2 nM for trastuzumab and pertuzumab) and was dispensed using the Tecan D300e Digital Dispenser (Tecan) to make spotted 384-well plates. In both experiment settings, a total volume of 40 nl of each compound condition was dispensed into each well, for limiting the final DMSO concentration to 0.1% during the treatment period. Further details on the cell viability assay, as well as the DSS estimation are available in the Materials & Methods part of Chen et al., 2022.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: This dataset has been superseded by the dataset found at "End-Use Load Profiles for the U.S. Building Stock" (submission 4520; linked in the submission resources), which is a comprehensive and validated representation of hourly load profiles in the U.S. commercial and residential building stock. The End-Use Load Profiles project website includes links to data viewers for this new dataset. For documentation of dataset validation, model calibration, and uncertainty quantification, see Wilson et al. (2022).
These data were first created around 2012 as a byproduct of various analyses of solar photovoltaics and solar water heating (see references below for are two examples). This dataset contains several errors and limitations. It is recommended that users of this dataset transition to the updated version of the dataset posted in the resources. This dataset contains weather data, commercial load profile data, and residential load profile data.
Weather The Typical Meteorological Year 3 (TMY3) provides one year of hourly data for around 1,000 locations. The TMY weather represents 30-year normals, which are typical weather conditions over a 30-year period.
Commercial The commercial load profiles included are the 16 ASHRAE 90.1-2004 DOE Commercial Prototype Models simulated in all TMY3 locations, with building insulation levels changing based on ASHRAE 90.1-2004 requirements in each climate zone. The folder names within each resource represent the weather station location of the profiles, whereas the file names represent the building type and the representative city for the ASHRAE climate zone that was used to determine code compliance insulation levels. As indicated by the file names, all building models represent construction that complied with the ASHRAE 90.1-2004 building energy code requirements. No older or newer vintages of buildings are represented.
Residential The BASE residential load profiles are five EnergyPlus models (one per climate region) representing 2009 IECC construction single-family detached homes simulated in all TMY3 locations. No older or newer vintages of buildings are represented. Each of the five climate regions include only one heating fuel type; electric heating is only found in the Hot-Humid climate. Air conditioning is not found in the Marine climate region.
One major issue with the residential profiles is that for each of the five climate zones, certain location-specific algorithms from one city were applied to entire climate zones. For example, in the Hot-Humid files, the heating season calculated for Tampa, FL (December 1 - March 31) was unknowingly applied to all other locations in the Hot-Humid zone, which restricts heating operation outside of those days (for example, heating is disabled in Dallas, TX during cold weather in November). This causes the heating energy to be artificially low in colder parts of that climate zone, and conversely the cooling season restriction leads to artificially low cooling energy use in hotter parts of each climate zone. Additionally, the ground temperatures for the representative city were used across the entire climate zone. This affects water heating energy use (because inlet cold water temperature depends on ground temperature) and heating/cooling energy use (because of ground heat transfer through foundation walls and floors). Representative cities were Tampa, FL (Hot-Humid), El Paso, TX (Mixed-Dry/Hot-Dry), Memphis, TN (Mixed-Humid), Arcata, CA (Marine), and Billings, MT (Cold/Very-Cold).
The residential dataset includes a HIGH building load profile that was intended to provide a rough approximation of older home vintages, but it combines poor thermal insulation with larger house size, tighter thermostat setpoints, and less efficient HVAC equipment. Conversely, the LOW building combines excellent thermal insulation with smaller house size, wider thermostat setpoints, and more efficient HVAC equipment. However, it is not known how well these HIGH and LOW permutations represent the range of energy use in the housing stock.
Note that on July 2nd, 2013, the Residential High and Low load files were updated from 366 days in a year for leap years to the more general 365 days in a normal year. The archived residential load data is included from prior to this date.
Facebook
TwitterThe data in this accession was collected using seventeen different ships during twenty four different cruises between July 16, 1971 and April 30, 1990. The real time data of water temperature at varying depth bathythermograph (XBT) was recorded in stripcharts by the US Navy. One box analog XBT'S and radio log sheets was submitted by Mr. Paul Stevens of Fleet Numerical Oceanography Center, Monterey, CA to NODC. The XBT data from these stripcharts have been digitized and is present in the current NODC data storage system in C116 file format.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complex nature of pollutants and the functional potential of microbial communities thriving in the activated sludge of pulp-paper mill wastewater treatment system remain poorly studied. Activated sludge samples were randomly collected from five different locations within the aeration tanks and combined to form a composite sample for further analysis. Physicochemical characterization and GC-MS analysis of the sludge samples revealed the presence of heavy metals and various chloroorganic pollutants. Simultaneously, high-throughput sequencing (HTS) of the V3–V4 hypervariable region of the 16S rRNA genes derived from the collected sludge sample was performed on the Illumina MiSeq platform. The present dataset includes raw sequencing data (FASTQ files), analysis reports, OUT Table, pie charts, rarefaction curve, and a Krona chart. Additionally, it provides functional annotation tables derived from KEGG, EC, and MetaCyc databases using PICRUSt2 analysis. Taxonomic analysis of the metagenomic sequence data revealed that Proteobacteria was the dominant phylum in the sludge. In addition, other phyla, such as Bacteriodetes, Acidobacteria, Planctomycetes, Chlorolfexi, Actinobacteria, and Verrucomicrobia, were also recorded within the range between 13.27% and 4.1% in sludge. At the genus and species levels, the most abundant genus was an unclassified member (3.62%) of the family Rhodospirillaceae. This unclassified communities could be target taxa in future classification. PICRUSt2-based functional annotation identified a variety of genes, enzymes, and metabolic pathways associated with microbial adaptation and survival in stress-prone environments. This research provides valuable insights into the nature of organic and inorganic pollutants, the composition of bacterial communities, and their potential functional roles in chlorolignin waste-contaminated environments. Future research should focus on the isolation and enrichment of the promising microbial species characterized in this study, as they hold significant potential for developing effective in situ bioremediation strategies
Facebook
TwitterThe objective of the Rodent Research-23 missions (RR-23) was to better understand the effects of spaceflight on the eyes, specifically on the structure and function of the arteries, veins, and lymphatic vessels that are needed to maintain vision. To this end, twenty male, C57BL/6J, 16-17 weeks old mice were delivered to the ISS on SpaceX-21 in a single transporter, transferred to two rodent habitats, and maintained in microgravity for 38 days. Flight mice were then returned to Earth alive (Jan 13th, 2021). After splashdown in the Atlantic Ocean, mice were transported to Kennedy Space Center via helicopter. The 20 Flight, 20 Habitat Ground Control (HGC), and 20 Vivarium Ground Control (VGC) mice were removed from Rodent Transporters (Flight and HGC) or vivarium cages (VGC), placed into shipping containers, and flown to Texas A and M University. There, mice underwent post flight procedures, before euthanasia and tissue collection. Flight, HGC and VGC animals were euthanized and dissected on Jan 14th, 17th or 20th of 2021, respectively. Kidneys were preserved by immersion in RNAlater and stored at -80 ˚C until RNA was extracted, and libraries generated and sequenced (target 60 M clusters per sample, PE 150 bp). This dataset features 9 samples from the Flight group, 9 samples from the HGC group, and 9 samples from the VGC group. A technical replicate is included for one sample in each group. These consist of an independent library preparation for a single RNA extraction.
Facebook
TwitterSuccess.ai’s LinkedIn Data Solutions offer unparalleled access to a vast dataset of 700 million public LinkedIn profiles and 70 million LinkedIn company records, making it one of the most comprehensive and reliable LinkedIn datasets available on the market today. Our employee data and LinkedIn data are ideal for businesses looking to streamline recruitment efforts, build highly targeted lead lists, or develop personalized B2B marketing campaigns.
Whether you’re looking for recruiting data, conducting investment research, or seeking to enrich your CRM systems with accurate and up-to-date LinkedIn profile data, Success.ai provides everything you need with pinpoint precision. By tapping into LinkedIn company data, you’ll have access to over 40 critical data points per profile, including education, professional history, and skills.
Key Benefits of Success.ai’s LinkedIn Data: Our LinkedIn data solution offers more than just a dataset. With GDPR-compliant data, AI-enhanced accuracy, and a price match guarantee, Success.ai ensures you receive the highest-quality data at the best price in the market. Our datasets are delivered in Parquet format for easy integration into your systems, and with millions of profiles updated daily, you can trust that you’re always working with fresh, relevant data.
API Integration: Our datasets are easily accessible via API, allowing for seamless integration into your existing systems. This ensures that you can automate data retrieval and update processes, maintaining the flow of fresh, accurate information directly into your applications.
Global Reach and Industry Coverage: Our LinkedIn data covers professionals across all industries and sectors, providing you with detailed insights into businesses around the world. Our geographic coverage spans 259M profiles in the United States, 22M in the United Kingdom, 27M in India, and thousands of profiles in regions such as Europe, Latin America, and Asia Pacific. With LinkedIn company data, you can access profiles of top companies from the United States (6M+), United Kingdom (2M+), and beyond, helping you scale your outreach globally.
Why Choose Success.ai’s LinkedIn Data: Success.ai stands out for its tailored approach and white-glove service, making it easy for businesses to receive exactly the data they need without managing complex data platforms. Our dedicated Success Managers will curate and deliver your dataset based on your specific requirements, so you can focus on what matters most—reaching the right audience. Whether you’re sourcing employee data, LinkedIn profile data, or recruiting data, our service ensures a seamless experience with 99% data accuracy.
Key Use Cases:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
dataset summaryplankton and detritus are essential components of the earth’s oceans influencing biogeochemical cycles and carbon sequestration. climate change impacts their composition and marine ecosystems as a whole. to improve our understanding of these changes, standardized observation methods and integrated global datasets are needed to enhance the accuracy of ecological and climate models. here, we present a global dataset for plankton and detritus obtained by two versions of the underwater vision profiler 5 (uvp5). this release contains the images classified in 33 homogenized categories, as well as the metadata associated with them, reaching 3,114 profiles and ca. 8 million objects acquired between 2008-2018 at global scale. the geographical distribution of the dataset is unbalanced, with the equatorial region (30° s - 30° n) being the most represented, followed by the high latitudes in the northern hemisphere and lastly the high latitudes in the southern hemisphere. detritus is the most abundant category in terms of concentration (90%) and biovolume (95%), although its classification in different morphotypes is still not well established. copepoda was the most abundant taxa within the plankton, with trichodesmium colonies being the second most abundant. the two versions of uvp5 (sd and hd) have different imagers, resulting in a different effective size range to analyse plankton and detritus from the images (hd objects >600 µm, sd objects >1 mm) and morphological properties (grey levels, etc.) presenting similar patterns, although the ranges may differ. a large number of images of plankton and detritus will be collected in the future by the uvp5, and the public availability of this dataset will help it being utilized as a training set for machine learning and being improved by the scientific community. this will reduce uncertainty by classifying previously unclassified objects and expand the classification categories, ultimately enhancing biodiversity quantification.data tablesthe data set is organised according to:- samples : underwater vision profiler 5 profiles, taken at a given point in space and time.- objects : individual uvp images, taken at a given depth along the each profile, on which various morphological features were measured and that where then classified taxonomically in ecotaxa.samples and objects have unique identifiers. the sample_id is used to link the different tables of the data set together. all files are tab separated values, utf8 encoded, gzip compressed.samples.tsv.gzsample_id
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In order to assess the impact of the Land Restoration Program, understanding what land restoration options work, where and for whom, there is need to identify the context-specific variables that may influence the performance of the restoration options as well as their uptake. In addition to monitoring the performance of the restoration option being implemented, a registration of the farmers involved in the project was conducted. A standard household survey was used, assessing both the socio-economic and biophysical characteristics of the households. The farmers were from four district of Ethiopia: Boset, Gursum, Samre and Tsaeda Emba. The present dataset includes socio-economical data about 173 households, including general information about the farms. Specific data about agricultural operations, crops, trees and the experimental plots developed inside the project, are part of a separated dataset. NOTE: The coordinates were removed from the dataset in May 2021, in order to comply with GDPR standards. The location details are available on request: please contact the author and explain the purpose of your research.
Facebook
TwitterCompany Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!