100+ datasets found
  1. Additional file 1 of SEDE-GPS: socio-economic data enrichment based on GPS...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theodor Sperlea; Stefan FĂźser; Jens Boenigk; Dominik Heider (2023). Additional file 1 of SEDE-GPS: socio-economic data enrichment based on GPS information [Dataset]. http://doi.org/10.6084/m9.figshare.7405250.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Theodor Sperlea; Stefan FĂźser; Jens Boenigk; Dominik Heider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table contains names, positions, and references for the samples contained in the sequence dataset and whether Prokaryotes and/or Eukaryotes were analyzed from the sample in this study. (CSV 3 kb)

  2. n

    Matlab example for Local Enrichment Analysis (LEA) analysis with real data

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Aug 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berend Snijder; Yannik Severin (2022). Matlab example for Local Enrichment Analysis (LEA) analysis with real data [Dataset]. http://doi.org/10.5061/dryad.2jm63xssk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 29, 2022
    Dataset provided by
    ETH Zurich
    Authors
    Berend Snijder; Yannik Severin
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Phenotypic plasticity is essential to the immune system, yet the factors that shape it are not fully understood. Here, we comprehensively analyze immune cell phenotypes including morphology across human cohorts by single-round multiplexed immunofluorescence, automated microscopy, and deep learning. Using the uncertainty of convolutional neural networks to cluster the phenotypes of 8 distinct immune cell subsets, we find that the resulting maps are influenced by donor age, gender, and blood pressure, revealing distinct polarization and activation-associated phenotypes across immune cell classes. We further associate T-cell morphology to transcriptional state based on their joint donor variability, and validate an inflammation-associated polarized T-cell morphology, and an age-associated loss of mitochondria in CD4+ T-cells. Taken together, we show that immune cell phenotypes reflect both molecular and personal health information, opening new perspectives into the deep immune phenotyping of individual people in health and disease. Methods This dataset accompanies the manuscript "Multiplexed high-throughput immune cell imaging reveals molecular health-associated phenotypes" by Yannik Severin et al., Science Advances, 2022. It includes: - knnlea.m: Matlab function for the presented Local Enrichment Analysis method - LEA_Example_Data.mat containing data from the manuscript to reproduce a LEA analysis - LEA_Example_Script.mat that runs through the analysis steps - README.txt

  3. d

    Consumer Marketing Data, B2C Consumer Address Enrichment, USA, CCPA...

    • datarade.ai
    .json, .csv
    Updated Mar 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Versium (2023). Consumer Marketing Data, B2C Consumer Address Enrichment, USA, CCPA Compliant [Dataset]. https://datarade.ai/data-products/versium-reach-b2c-consumer-address-enrichment-usa-gdpr-an-versium
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Mar 11, 2023
    Dataset authored and provided by
    Versium
    Area covered
    United States
    Description

    With Versium REACH's Contact Append or Contact Append Plus you can add consumer contact data, including multiple phone numbers or mobile-only to your list of customers or prospects. With Versium REACH you are connected to our proprietary database of over 300+ million consumers, 1 Billion emails, and over 150 million households in the United States. Through either our API or platform you can have contact data appended to your records with any of the following supplied values; Email Address Phone Postal Address, City, State, ZIP First Name, Last Name, City, State First Name, Last Name, ZIP

  4. Additional file 20: of MGSEA – a multivariate Gene set enrichment analysis...

    • springernature.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khong-Loon Tiong; Chen-Hsiang Yeang (2023). Additional file 20: of MGSEA – a multivariate Gene set enrichment analysis [Dataset]. http://doi.org/10.6084/m9.figshare.7861256.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Khong-Loon Tiong; Chen-Hsiang Yeang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary File S1. The R source codes of the MGSEA program, a toy example dataset, and a brief explanation for running the program. (ZIP 1832 kb)

  5. Small Molecule-Protein Interaction Data

    • kaggle.com
    zip
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indranil Bhattacharyya (2024). Small Molecule-Protein Interaction Data [Dataset]. https://www.kaggle.com/datasets/photon98/leash-bio-engineered-data-training
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 19, 2024
    Authors
    Indranil Bhattacharyya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    About the Dataset and How I augmented the data:

    The dataset used in this augmentation process(used a subset of the original training data) is sourced from the Leash Bio - Predict New Medicines with BELKA competition(Read More). It comprises examples of small molecules categorized through binary classification, determining whether each molecule is a binder to one of three protein targets. The data collection method involves utilizing DNA-encoded chemical library (DEL) technology.

    Chemical representations are expressed in SMILES (Simplified Molecular-Input Line-Entry System), while the labels denote binary binding classifications, corresponding to three distinct protein targets.

    I've expanded the original dataset by augmenting it with additional features derived from the existing data. Specifically, I've calculated and included three new features:

    • mol_wt (Molecular Weight): Calculated based on the SMILES data using RDKit, providing insight into the mass of each molecule.
    • logP (Partition Coefficient): Also derived from the SMILES data using RDKit, representing the logarithm of the partition coefficient, a measure of a molecule's hydrophobicity and its ability to partition between a hydrophobic solvent and water.
    • rotamers (Number of Rotamers): Determined from the SMILES data using RDKit, indicating the number of distinct conformations or rotational isomers a molecule can adopt. These additional features aim to enrich the feature matrix, potentially enhancing the predictive power of models trained on the augmented dataset.

    Data Description:

    id- A unique example_id we use to identify the molecule-binding target pair. buildingblock1_smiles - The structure, in SMILES, of the first building block **buildingblock2_smiles **- The structure, in SMILES, of the second building block buildingblock3_smiles - The structure, in SMILES, of the third building block **molecule_smiles **- The structure of the fully assembled molecule, in SMILES. This includes the three building blocks and the triazine core. Note we use a [Dy] as the stand-in for the DNA linker. protein_name - The protein target name binds - The target column. A binary class label of whether the molecule binds to the protein. Not available for the test set. mol_wt - The molecule's molecular weight derived from SMILES data using RDKit. logP - The logP of the molecule derived from SMILES data using RDKit. **rotamers **- The number of rotamers of the molecule derived from SMILES data using RDKit.

    Targets: binds

    Proteins are encoded in the genome, and names of the genes encoding those proteins are typically bestowed by their discoverers and regulated by the Hugo Gene Nomenclature Committee. The protein products of these genes can sometimes have different names, often due to the history of their discovery.

  6. c

    Data from: Argon data for enriched MORB from the 8°20' N seamount chain

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Argon data for enriched MORB from the 8°20' N seamount chain [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/argon-data-for-enriched-morb-from-the-820-n-seamount-chain
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This dataset accompanies planned publication 'Near-Ridge Magmatism Constrained Using 40Ar/39Ar Dating of Enriched MORB from the 8°20' N Seamount Chain'. The Ar/Ar data are for samples that record the volcanic history of the area. The geochronology provides time constraints for the eruption of rocks studied in the manuscript. Samples were collected from the 8°20' N seamount chain by Molly Anderson (University of Florida), who sent them to the USGS Denver Argon Geochronology Laboratory for Ar/Ar analysis.

  7. m

    EESQ and EESQ-M reliability and validity raw data

    • data.mendeley.com
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nik nasihah nik ramli (2024). EESQ and EESQ-M reliability and validity raw data [Dataset]. http://doi.org/10.17632/ct74hk8wbw.1
    Explore at:
    Dataset updated
    Dec 2, 2024
    Authors
    nik nasihah nik ramli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains raw data from the pilot study samples used for the validity and reliability testing of the Environmental Enrichment Scale Questionnaire (EESQ) and its translated Malay version (EESQ-M).

  8. Enhancing MovieLens Dataset: Enriching Recommendations with Audio...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Botti-Cebria; Victor Botti-Cebria; Laura Sebastia; Laura Sebastia; Vanessa Moscardo; Vanessa Moscardo (2023). Enhancing MovieLens Dataset: Enriching Recommendations with Audio Information, Transcriptions, and Metadata [Dataset]. http://doi.org/10.5281/zenodo.8037433
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Victor Botti-Cebria; Victor Botti-Cebria; Laura Sebastia; Laura Sebastia; Vanessa Moscardo; Vanessa Moscardo
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Nowadays, there are lots of datasets available for training and experimentation in the field of recommender systems. Specifically, in the recommendation of audiovisual content, the MovieLens dataset is a prominent example. It is focused on the user-item relationship, providing actual interaction data between users and movies. However, although movies can be described with several characteristics, this dataset only offers limited information about the movie genres.

    In this work, we propose enriching the MovieLens dataset by incorporating metadata available on the web (such as cast, description, keywords, etc.) and movie trailers. By leveraging the trailers, we extract audio information and generate transcriptions for each trailer, introducing a crucial textual dimension to the dataset. The audio information was extracted by the waveform and frequency analysis, followed by the application of dimensionality reduction techniques. For the transcription generation, the deep learning model Whisper was used. Finally, metadata was obtained from TMDB, and the BERT model was applied to extract embeddings.

    These additional attributes enrich the original dataset, providing deeper and more precise analysis. Then, the use of this extended and enhanced dataset could drive significant advancements in recommendation systems, enhancing user experiences by providing more relevant and tailored movie recommendations based on their tastes and preferences.

  9. d

    Factori USA Consumer Graph Data | socio-demographic, location, interest and...

    • datarade.ai
    .json, .csv
    Updated Jul 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Factori (2022). Factori USA Consumer Graph Data | socio-demographic, location, interest and intent data | E-Commere |Mobile Apps | Online Services [Dataset]. https://datarade.ai/data-products/factori-usa-consumer-graph-data-socio-demographic-location-factori
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jul 23, 2022
    Dataset authored and provided by
    Factori
    Area covered
    United States of America
    Description

    Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

    Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences.

    1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc.
    2. Demographics - Gender, Age Group, Marital Status, Language etc.
    3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc
    4. Persona - Consumer type, Communication preferences, Family type, etc
    5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc.
    6. Household - Number of Children, Number of Adults, IP Address, etc.
    7. Behaviours - Brand Affinity, App Usage, Web Browsing etc.
    8. Firmographics - Industry, Company, Occupation, Revenue, etc
    9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc.
    10. Auto - Car Make, Model, Type, Year, etc.
    11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

    Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

    Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

    Consumer Graph Use Cases:

    360-Degree Customer View:Get a comprehensive image of customers by the means of internal and external data aggregation.

    Data Enrichment:Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment

    Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity.

    Advertising & Marketing:Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

    Using Factori Consumer Data graph you can solve use cases like:

    Acquisition Marketing Expand your reach to new users and customers using lookalike modeling with your first party audiences to extend to other potential consumers with similar traits and attributes.

    Lookalike Modeling

    Build lookalike audience segments using your first party audiences as a seed to extend your reach for running marketing campaigns to acquire new users or customers

    And also, CRM Data Enrichment, Consumer Data Enrichment B2B Data Enrichment B2C Data Enrichment Customer Acquisition Audience Segmentation 360-Degree Customer View Consumer Profiling Consumer Behaviour Data

    Here's the schema of Consumer Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
    credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
    mortgage_loan2_type mortgage_lender_code
    mortgage_loan2_render_code
    mortgage_lender mortgage_loan2_lender
    mortgage_loan2_ratetype mortgage_rate
    mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
    office_census_block_group
    office_census_tract office_county_code
    company_phone
    company_credit_score
    company_csa_code
    company_dpbc
    company_franchiseflag
    company_facebookurl company_linkedinurl company_twitterurl
    company_website company_fortune_rank
    company_government_type company_headquarters_branch company_home_business
    company_industry
    company_num_pcs_used
    company_num_employees
    company_firm_individual company_msa company_msa_name
    company_naics_code
    company_naics_description
    company_naics_code2 company_naics_description2
    company_sic_code2
    company_sic_code2_desc...

  10. Data from: Large-Scale Learning of Structure−Activity Relationships Using a...

    • acs.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georg Hinselmann; Lars Rosenbaum; Andreas Jahn; Nikolas Fechner; Claude Ostermann; Andreas Zell (2023). Large-Scale Learning of Structure−Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics [Dataset]. http://doi.org/10.1021/ci100073w.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    ACS Publications
    Authors
    Georg Hinselmann; Lars Rosenbaum; Andreas Jahn; Nikolas Fechner; Claude Ostermann; Andreas Zell
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The goal of this study was to adapt a recently proposed linear large-scale support vector machine to large-scale binary cheminformatics classification problems and to assess its performance on various benchmarks using virtual screening performance measures. We extended the large-scale linear support vector machine library LIBLINEAR with state-of-the-art virtual high-throughput screening metrics to train classifiers on whole large and unbalanced data sets. The formulation of this linear support machine has an excellent performance if applied to high-dimensional sparse feature vectors. An additional advantage is the average linear complexity in the number of non-zero features of a prediction. Nevertheless, the approach assumes that a problem is linearly separable. Therefore, we conducted an extensive benchmarking to evaluate the performance on large-scale problems up to a size of 175000 samples. To examine the virtual screening performance, we determined the chemotype clusters using Feature Trees and integrated this information to compute weighted AUC-based performance measures and a leave-cluster-out cross-validation. We also considered the BEDROC score, a metric that was suggested to tackle the early enrichment problem. The performance on each problem was evaluated by a nested cross-validation and a nested leave-cluster-out cross-validation. We compared LIBLINEAR against a Naïve Bayes classifier, a random decision forest classifier, and a maximum similarity ranking approach. These reference approaches were outperformed in a direct comparison by LIBLINEAR. A comparison to literature results showed that the LIBLINEAR performance is competitive but without achieving results as good as the top-ranked nonlinear machines on these benchmarks. However, considering the overall convincing performance and computation time of the large-scale support vector machine, the approach provides an excellent alternative to established large-scale classification approaches.

  11. Gene set enrichment data files

    • figshare.com
    txt
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leon French (2022). Gene set enrichment data files [Dataset]. http://doi.org/10.6084/m9.figshare.21404907.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Leon French
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Files example_GO_groups.csv: example Gene Ontology group to gene symbol mapping.

  12. d

    Data from: Assessment of targeted enrichment locus capture across time and...

    • datadryad.org
    • search.dataone.org
    • +2more
    zip
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rhema Uche-Dike; Aaron Goodman; Ethan Tolman; John Abbott; Jesse Breinholt; Seth Bybee; Paul Frandsen; Stephen Gosnell; Rob Guralnick; Vincent Kalkman; Manpreet Kohli; Judicael Fomekong-Lontchi; Pungki Lupiyaningdyah; Lacie Newton; Jessica Ware (2023). Assessment of targeted enrichment locus capture across time and museums using odonate specimens [Dataset]. http://doi.org/10.5061/dryad.kprr4xh8z
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 18, 2023
    Dataset provided by
    Dryad
    Authors
    Rhema Uche-Dike; Aaron Goodman; Ethan Tolman; John Abbott; Jesse Breinholt; Seth Bybee; Paul Frandsen; Stephen Gosnell; Rob Guralnick; Vincent Kalkman; Manpreet Kohli; Judicael Fomekong-Lontchi; Pungki Lupiyaningdyah; Lacie Newton; Jessica Ware
    Time period covered
    2023
    Description

    IQ-Tree v.2.1.3 (Data matrix - fasta file) UNIX/Command line or a Text Editor for viewing (fastq files - raw data) FigTree (Tree file - .treefile) BBEdit (Partition files - Nexus)

  13. Z

    MAT-Builder datasets

    • data.niaid.nih.gov
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiara Pugliese (2023). MAT-Builder datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7839805
    Explore at:
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    Chiara Pugliese
    Chiara Renso
    Francesco Lettich
    Fabio Pinelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The archive contains two datasets that have been used to empirically evaluate MAT-Builder, a system to generate multiple aspect trajectories.

    The first one is located in the "rome" folder and contains 26395 trajectories from 3181 individuals. The trajectories move over the city of Rome and were collected from OpenStreetMap. The folder contains also auxiliary datasets, i.e., the set of POIs within the province of Rome's boundaries (downloaded from OpenStreetMap) (see the "poi" subfolder), historical weather information (downloaded from Meteostat https://meteostat.net/it/) (see the "weather" subfolder), and a dataset of social media posts from the individuals which was generated synthetically (see the "tweets" subfolder). All the datasets are pandas dataframes, except for the POI dataset which is a geopandas DataFrame. All the datasets have been stored according to the parquet format.

    The second one is located in the "geolife" folder, and contains the GeoLife dataset. The dataset contains 17621 trajectories from 178 users. The timestamps of the trajectory samples have been adjusted from the GMT to the GMT+8 timezone. As in the former dataset's case, this folder contains also a dataset of POIs, a dataset of historical weather information, and a dataset of social media posts that were generated synthetically.

    For more information on the MAT-Builder project (i.e., published papers, how to use to datasets, how the information within the datasets is structured, and so on) we refer to the MAT-Builder's GitHub page: https://github.com/chiarap2/MAT_Builder.

  14. Clust_100_GE_datasets

    • zenodo.org
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly (2020). Clust_100_GE_datasets [Dataset]. http://doi.org/10.5281/zenodo.1298541
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Basel Abu-Jamous; Basel Abu-Jamous; Steven Kelly; Steven Kelly
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    100 microarray and RNA-seq gene expression datasets from five model species (human, mouse, fruit fly, arabidopsis plants, and baker's yeast). These datasets represent the benchmark set that was used to test our clust clustering method and to compare it with seven widely used clustering methods (Cross-Clustering, k-means, self-organising maps, MCL, hierarchical clustering, CLICK, and WGCNA). This data resource includes raw data files, pre-processed data files, clustering results, clustering results evaluation, and scripts.

    The files are split into eight zipped parts, 100Datasets_0.zip to 100Datasets_7.zip. The contents of the three zipped files should be extracted to a single folder (e.g. 100Datasets).

    Below is a thorough description of the files and folders in this data resource.

    Scripts

    The scripts used to apply each one of the clustering methods to each one of the 100 datasets and to evaluate their results are all included in the folder (scripts/).

    Datasets and clustering results (folders starting with D)

    The datasets are labelled as D001 to D100. Each dataset has two folders: D###/ and D###_Res/, where ### is the number of the dataset. The first folder only includes the raw dataset while the second folder includes the results of applying the clustering methods to that dataset. The files ending with _B.tsv include clustering results in the form of a partition matrix. The files ending with _E include metrics evaluating the clustering results. The files ending with _go and _go_E respectively include the enriched GO terms in the clustering results and evaluation metrics of these GO terms. The files ending with _REACTOME and _REACTOME_E are similar to the GO term files but for the REACTOME pathway enrichment analysis. Each of these D###_Res/ folders includes a sub-folder "ParamSweepClust" which includes the results of applying clust multiple times to the same dataset while sweeping some parameters.

    Large datasets analysis results

    The folder LargeDatasets/ includes data and results for what we refer to as "large" datasets. These are 19 datasets that have more than 50 samples including replicates and have not therefore been included in the set of 100 datasets. However, they fit all of the other dataset selection criteria. We have compared clust with the other clustering methods over these datasets to demonstrate that clust still outperforms other datasets over larger datasets. This folder includes folders LD001/ to LD019/ and LD001_Res/ to LD019_Res/. These have similar format and contents as the D###/ and D###_Res/ folders described above.

    Simultaneous analysis of multiple datasets (folders starting with MD)

    As our clust method is design to be able to extract clusters from multiple datasets simultaneously, we also tested it over multiple datasets. All folders starting with MD_ are related to "multiple datasets (MD)" results. Each MD experiment simultaneously analyses d randomly selected datasets either out of a set of 10 arabidopsis datasets or out of a set of 10 yeast datasets. For each one of the two species, all d values from 2 to 10 were tested, and at each one of these d values, 10 different runs were conducted, where at each run a different subset of d datasets is selected randomly.

    The folders MD_10A and MD_10Y include the full sets of 10 arabidposis or 10 yeast datasets, respectively. Each folder with the format MD_10#_d#_Res## includes the results of applying the eight clustering methods at one of the 10 random runs of one of the selected d values. For example, the "MD_10A_d4_Res03/" folder includes the clustering results of the 3rd random selection of 4 arabidopsis datasets (the letter A in the folder's name refers to arabidopsis).

    Our clust method is applied directly over multiple datasets where each dataset is in a separate data file. Each "MD_10#_d#_Res##" folder includes these individual files in a sub-folder named "Processed_Data/". However, the other clustering methods only accept a single input data file. Therefore, the datasets are merged first before being submitted to these methods. Each "MD_10#_d#_Res##" folder includes a file "X_merged.tsv" for the merged data.

    Evaluation metrics (folders starting with Metrics)

    Each clustering results folder (D##_Res or MD_10#_d#_Res##) includes some clustering evaluation files ending with _E. This information is combined into tables for all datasets, and these tables appear in the folders starting with "Metrics_".

    Other files and folders

    The GO folder includes the reference GO term annotations for arabidopsis and yeast. Similarly, the REACTOME folder includes the reference REACTOME pathway annotations for arabidopsis and yeast. The Datasets file includes a TAB delimited table describing the 100 datasets. The SearchCriterion file includes the objective methodology of searching the NCBI database to select these 100 datasets. The Specials file includes some special considerations for couple of datasets that differ a bit from what is described in the SearchCriterion file. The Norm### files and the files in the Reps/ folder describe normalisation codes and replicate structures for the datasets and were fed to the clust method as inputs. The Plots/ folder includes plots of the gene expression profiles of the individual genes in the clusters generated by each one of the eight methods over each one of the 100 datasets. Only up to 14 clusters per method are plotted.

  15. d

    Employee Data | The Largest Dataset Of Active Profiles | Global / 1B Records...

    • datarade.ai
    .json
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avanteer (2025). Employee Data | The Largest Dataset Of Active Profiles | Global / 1B Records / Updated Daily [Dataset]. https://datarade.ai/data-products/employee-data-the-largest-dataset-of-active-profiles-glob-avanteer
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Apr 19, 2025
    Dataset authored and provided by
    Avanteer
    Area covered
    Fiji, Gambia, Nicaragua, Pitcairn, United Arab Emirates, Tunisia, Bulgaria, Maldives, Anguilla, State of
    Description

    //// 🌍 Avanteer Employee Data ////

    The Largest Dataset of Active Global Profiles 1B+ Records | Updated Daily | Built for Scale & Accuracy

    Avanteer’s Employee Data offers unparalleled access to the world’s most comprehensive dataset of active professional profiles. Designed for companies building data-driven products or workflows, this resource supports recruitment, lead generation, enrichment, and investment intelligence — with unmatched scale and update frequency.

    //// 🔧 What You Get ////

    1B+ active profiles across industries, roles, and geographies

    Work history, education history, languages, skills and multiple additional datapoints.

    AI-enriched datapoints include: Gender Age Normalized seniority Normalized department Normalized skillset MBTI assessment

    Daily updates, with change-tracking fields to capture job changes, promotions, and new entries.

    Flexible delivery via API, S3, or flat file.

    Choice of formats: raw, cleaned, or AI-enriched.

    Built-in compliance aligned with GDPR and CCPA.

    //// 💡 Key Use Cases ////

    ✅ Smarter Talent Acquisition Identify, enrich, and engage high-potential candidates using up-to-date global profiles.

    ✅ B2B Lead Generation at Scale Build prospecting lists with confidence using job-related and firmographic filters to target decision-makers across verticals.

    ✅ Data Enrichment for SaaS & Platforms Supercharge ATS, CRMs, or HR tech products by syncing enriched, structured employee data through real-time or batch delivery.

    ✅ Investor & Market Intelligence Analyze team structures, hiring trends, and senior leadership signals to discover early-stage investment opportunities or evaluate portfolio companies.

    //// 🧰 Built for Top-Tier Teams Who Move Fast ////

    Zero duplicate, by design

    <300ms API response time

    99.99% guaranteed API uptime

    Onboarding support including data samples, test credits, and consultations

    Advanced data quality checks

    //// ✅ Why Companies Choose Avanteer ////

    ➔ The largest daily-updated dataset of global professional profiles

    ➔ Trusted by sales, HR, and data teams building at enterprise scale

    ➔ Transparent, compliant data collection with opt-out infrastructure baked in

    ➔ Dedicated support with fast onboarding and hands-on implementation help

    ////////////////////////////////

    Empower your team with reliable, current, and scalable employee data — all from a single source.

  16. e

    Teosto Open Api – open interface for live music data

    • data.europa.eu
    unknown
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yhdistykset ja säätiöt (2023). Teosto Open Api – open interface for live music data [Dataset]. https://data.europa.eu/data/datasets/3c7de080-ea97-4ddb-9a26-218579825170?locale=en
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Dec 6, 2023
    Dataset authored and provided by
    Yhdistykset ja säätiöt
    Description

    The live music data collected by Teosto is the largest and most comprehensive in Finland. The data opened through the open interface now includes all live gigs announced to Teosto in Finland last year (2014): the dates of the gigs, the venues with their location and coordinates, the performers, the songs presented and the authors of the songs.

    We challenge developers to enrich live music spatial data and develop new, innovative uses for it. Examples of data enrichment include combining other open spatial datasets with event data or music-related metadata with song-specific data.

    The development of live data is part of the Open Finland Challenge competition and the Ultrahack event.

  17. NBA WNBA play-by-play and shots data

    • kaggle.com
    zip
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vladislav Shufinskiy (2025). NBA WNBA play-by-play and shots data [Dataset]. https://www.kaggle.com/datasets/brains14482/nba-playbyplay-and-shotdetails-data-19962021
    Explore at:
    zip(1683596108 bytes)Available download formats
    Dataset updated
    Jun 26, 2025
    Authors
    Vladislav Shufinskiy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description

    NBA anba WNBA dataset is a large-scale play-by-play and shot-detail dataset covering both NBA and WNBA games, collected from multiple public sources (e.g., official league APIs and stats sites). It provides every in-game event—from period starts, jump balls, fouls, turnovers, rebounds, and field-goal attempts through free throws—along with detailed shot metadata (shot location, distance, result, assisting player, etc.).

    Also you can download dataset from github or GoogleDrive

    Tutorials

    1. NBA play-by-play dataset R example

    I will be grateful for ratings and stars on github, but the best gratitude is use of dataset for your projects.

    Useful links:

    Motivation

    I made this dataset because I want to simplify and speed up work with play-by-play data so that researchers spend their time studying data, not collecting it. Due to the limits on requests on the NBA and WNBA website, and also because you can get play-by-play of only one game per request, collecting this data is a very long process.

    Using this dataset, you can reduce the time to get information about one season from a few hours to a couple of seconds and spend more time analyzing data or building models.

    I also added play-by-play information from other sources: pbpstats.com, data.nba.com, cdnnba.com. This data will enrich information about the progress of each game and hopefully add opportunities to do interesting things.

    Contact Me

    If you have any questions or suggestions about the dataset, you can write to me in a convenient channel for you:

  18. d

    Phone Number Data | Global Coverage | 100M+ B2B Mobile Phone Numbers | 95%+...

    • datarade.ai
    .json, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager.ai, Phone Number Data | Global Coverage | 100M+ B2B Mobile Phone Numbers | 95%+ Accuracy [Dataset]. https://datarade.ai/data-products/global-mobile-phone-number-data-90m-95-accuracy-api-b-forager-ai-905f
    Explore at:
    .json, .csvAvailable download formats
    Dataset provided by
    Forager.ai
    Area covered
    Botswana, Martinique, South Georgia and the South Sandwich Islands, Macedonia (the former Yugoslav Republic of), Japan, Colombia, United Arab Emirates, Uruguay, Moldova (Republic of), Cambodia
    Description

    Global B2B Mobile Phone Number Database | 100M+ Verified Contacts | 95% Accuracy Forager.ai provides the world’s most reliable mobile phone number data for businesses that refuse to compromise on quality. With 100 million+ professionally verified mobile numbers refreshed every 3 weeks, our database ensures 95% accuracy – so your teams never waste time on dead-end leads.

    Why Our Data Wins ✅ Accuracy You Can Trust 95% of mobile numbers are verified against live carrier records and tied to current job roles. Say goodbye to “disconnected number” voicemails.

    ✅ Depth Beyond Digits Each contact includes 150+ data points:

    Direct mobile numbers

    Current job title, company, and department

    Full career history + education background

    Location data + LinkedIn profiles

    Company size, industry, and revenue

    ✅ Freshness Guaranteed Bi-weekly updates combat job-hopping and role changes – critical for sales teams targeting decision-makers.

    ✅ Ethically Sourced & Compliant First-party collected data with full GDPR/CCPA compliance.

    Who Uses This Data?

    Sales Teams: Cold-call C-suite prospects with verified mobile numbers.

    Marketers: Run hyper-personalized SMS/WhatsApp campaigns.

    Recruiters: Source passive candidates with up-to-date contact intel.

    Data Vendors: License premium datasets to enhance your product.

    Tech Platforms: Power your SaaS tools via API with enterprise-grade B2B data.

    Flexible Delivery, Instant Results

    API (REST): Real-time integration for CRMs, dialers, or marketing stacks

    CSV/JSON: Campaign-ready files.

    PostgreSQL: Custom databases for large-scale enrichment

    Compliance: Full audit trails + opt-out management

    Why Forager.ai? → Proven ROI: Clients see 62% higher connect rates vs. industry averages (request case studies). → No Guesswork: Test-drive free samples before committing. → Scalable Pricing: Pay per record, license datasets, or get unlimited API access.

    B2B Mobile Phone Data | Verified Contact Database | Sales Prospecting Lists | CRM Enrichment | Recruitment Phone Numbers | Marketing Automation | Phone Number Datasets | GDPR-Compliant Leads | Direct Dial Contacts | Decision-Maker Data

    Need Proof? Contact us to see why Fortune 500 companies and startups alike trust Forager.ai for mission-critical outreach.

  19. d

    Data from: Enriching the ant tree of life: enhanced UCE bait set for...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Feb 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael G. Branstetter; John T. Longino; Philip S. Ward; Brant C. Faircloth (2017). Enriching the ant tree of life: enhanced UCE bait set for genome-scale phylogenetics of ants and other Hymenoptera [Dataset]. http://doi.org/10.5061/dryad.89n87
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 14, 2017
    Dataset provided by
    Dryad
    Authors
    Michael G. Branstetter; John T. Longino; Philip S. Ward; Brant C. Faircloth
    Time period covered
    Feb 12, 2017
    Area covered
    Global
    Description
    1. Targeted enrichment of conserved genomic regions (e.g., ultraconserved elements or UCEs) has emerged as a promising tool for inferring evolutionary history in many organismal groups. Because the UCE approach is still relatively new, much remains to be learned about how best to identify UCE loci and design baits to enrich them.

    2. We test an updated UCE identification and bait design workflow for the insect order Hymenoptera, with a particular focus on ants. The new strategy augments a previous bait design for Hymenoptera by (a) changing the parameters by which conserved genomic regions are identified and retained, and (b) increasing the number of genomes used for locus identification and bait design. We perform in vitro validation of the approach in ants by synthesizing an ant-specific bait set that targets UCE loci and a set of “legacy” phylogenetic markers. Using this bait set, we generate new data for 84 taxa (16/17 ant subfamilies) and extract loci from an additional 17 genome-e...

  20. d

    Data on the Enrichment and Isolation of the Acetylenotrophic and...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data on the Enrichment and Isolation of the Acetylenotrophic and Diazotrophic Isolate Bradyrhizobium sp. strain I71 (ver. 2.0, September 2022) [Dataset]. https://catalog.data.gov/dataset/data-on-the-enrichment-and-isolation-of-the-acetylenotrophic-and-diazotrophic-isolate-brad
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Acetylene (C2H2) is a molecule rarely found in nature, with few known natural sources, but acetylenotrophic microorganisms can use acetylene as their primary carbon and energy source. As of 2018 there were 15 known strains of aerobic and anaerobic acetylenotrophs, however we hypothesized that there may be yet unrecognized diversity of acetylenotrophs in nature. In this study, we expanded this diversity by isolating an aerobic acetylenotroph, Bradyrhizobium sp. strain I71, from trichloroethene (TCE)-contaminated soils undergoing bioremediation. TCE-contaminated soils from the NASA Ames Research Center in California were used to establish soil microcosms with acetylene as the primary carbon substrate and acetylene uptake was tracked over time and reported in T1_soil_microcosm_v2.0.csv. DNA was extracted from soil microcosm samples for microbial community analysis based on 16S rRNA gene sequencing; the resulting operational taxonomic units are presented in T2_soil_OTU_v2.0.csv. Bradyrhizobium sp. strain I71 was isolated from the soil microcosms and acetylene uptake and cell growth data for the isolate over time are shown in T3_soil_isolate_v2.0.csv. Nitrogen fixation assays for the pure culture of Bradyrhizobium sp. strain I71 are reported in T4_N2_fixation_v2.0.csv. Acetylene concentrations and cell densities from acetylenotrophic and heterotrophic growth assays for Bradyrhizobium sp. strain I71 are reported in T5_GrowthCurve_v2.0.csv

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Theodor Sperlea; Stefan FĂźser; Jens Boenigk; Dominik Heider (2023). Additional file 1 of SEDE-GPS: socio-economic data enrichment based on GPS information [Dataset]. http://doi.org/10.6084/m9.figshare.7405250.v1
Organization logoOrganization logo

Additional file 1 of SEDE-GPS: socio-economic data enrichment based on GPS information

Related Article
Explore at:
txtAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Theodor Sperlea; Stefan FĂźser; Jens Boenigk; Dominik Heider
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This table contains names, positions, and references for the samples contained in the sequence dataset and whether Prokaryotes and/or Eukaryotes were analyzed from the sample in this study. (CSV 3 kb)

Search
Clear search
Close search
Google apps
Main menu