100+ datasets found
  1. Data from: A Local Asynchronous Distributed Privacy Preserving Feature...

    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • data.nasa.gov
    • +1more
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/a-local-asynchronous-distributed-privacy-preserving-feature-selection-algorithm-for-large-
    Explore at:
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data is located at a central location. However, it becomes extremely challenging to perform the same when the data is distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world datasets in order to performance of the proposed algorithm.

  2. d

    Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012

    • catalog.data.gov
    • data.cityofchicago.org
    • +2more
    Updated Jan 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2024). Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012 [Dataset]. https://catalog.data.gov/dataset/census-data-selected-socioeconomic-indicators-in-chicago-2008-2012
    Explore at:
    Dataset updated
    Jan 12, 2024
    Dataset provided by
    data.cityofchicago.org
    Area covered
    Chicago
    Description

    This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” by Chicago community area, for the years 2008 – 2012. The indicators are the percent of occupied housing units with more than one person per room (i.e., crowded housing); the percent of households living below the federal poverty level; the percent of persons in the labor force over the age of 16 years that are unemployed; the percent of persons over the age of 25 years without a high school diploma; the percent of the population under 18 or over 64 years of age (i.e., dependency); and per capita income. Indicators for Chicago as a whole are provided in the final row of the table. See the full dataset description for more information at: https://data.cityofchicago.org/api/views/fwb8-6aw5/files/A5KBlegGR2nWI1jgP6pjJl32CTPwPbkl9KU3FxlZk-A?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\ECONOMIC_INDICATORS\Dataset_Description_socioeconomic_indicators_2012_FOR_PORTAL_ONLY.pdf

  3. Firm's selection of data services on Kubernetes environments worldwide 2024

    • statista.com
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Firm's selection of data services on Kubernetes environments worldwide 2024 [Dataset]. https://www.statista.com/statistics/1480261/data-service-of-choice-on-kubernetes-environment/
    Explore at:
    Dataset updated
    Aug 26, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    As of 2024, around 72 percent of organizations chose databases (NoSQL, SQL etc.) on Kubernetes environments. Additionally, 67 percent of organizations utilized analytics (Data processing/ELT/ETL).

  4. d

    Recruitment and Selection Activity Year End Report

    • catalog.data.gov
    • data.montgomerycountymd.gov
    • +1more
    Updated Apr 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2023). Recruitment and Selection Activity Year End Report [Dataset]. https://catalog.data.gov/dataset/recruitment-and-selection-activity-year-end-report
    Explore at:
    Dataset updated
    Apr 8, 2023
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    The information in the dataset provides information on the MCG Recruitment and Selection Activities which includes the volume of applications received for each job vacancy, number of applicants hired, applicant statuses and the type of hires (Permanent, Temporary, Rehire) for the respective fiscal year. Update Frequency : Annually

  5. d

    Binary response panel data models with sample selection and self‐selection...

    • b2find.dkrz.de
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Binary response panel data models with sample selection and self‐selection (replication data) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/9d89ff8f-4cf1-5fef-883d-3821134c99cd
    Explore at:
    Dataset updated
    Oct 24, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We consider estimating binary response models on an unbalanced panel, where the outcome of the dependent variable may be missing due to nonrandom selection, or there is self-selection into a treatment. In the present paper, we first consider estimation of sample selection models and treatment effects using a fully parametric approach, where the error distribution is assumed to be normal in both primary and selection equations. Arbitrary time dependence in errors is permitted. Estimation of both coefficients and partial effects, as well as tests for selection bias, are discussed. Furthermore, we consider a semiparametric estimator of binary response panel data models with sample selection that is robust to a variety of error distributions. The estimator employs a control function approach to account for endogenous selection and permits consistent estimation of scaled coefficients and relative effects.

  6. d

    Data from: Evaluating presence-only species distribution models with...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Aug 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan L. Warren; Nicholas Matzke; Teresa Iglesias (2020). Evaluating presence-only species distribution models with discrimination accuracy is uninformative for many applications [Dataset]. http://doi.org/10.5061/dryad.6ft55k9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 21, 2020
    Dataset provided by
    Dryad
    Authors
    Dan L. Warren; Nicholas Matzke; Teresa Iglesias
    Time period covered
    2020
    Area covered
    Australia
    Description

    Simulation code for Warren et al. 2019 - Journal of BiogeographySimulation code to accompany Warren et al. 2019, examining the relationship between discrimination accuracy and functional accuracy for ENM/SDM studiessim-code-Warren-et-al-2019-master.zip

  7. d

    Data from: Simulated data for genomic selection and genome-wide association...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 25, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John M. Hickey; Gregor Gorjanc (2014). Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods [Dataset]. http://doi.org/10.5061/dryad.nm290
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2014
    Dataset provided by
    Dryad
    Authors
    John M. Hickey; Gregor Gorjanc
    Time period covered
    2014
    Description

    File S11) AlphaDrop: executable for Linux

    2) macs: MaCS executable for linux

    3) msformatter: MaCS executable for linux

    4) Seed.txt: a file containing a random seed for initialising AlphaDrop

    5) RunMacs.sh: a shell script called by AlphaDrop when it runs MaCS

    6) AlphaDropSpec.txt: the specification file for AlphaDrop

    7) Pedigree.txt: an example externally supplied pedigree file

    8) MaCsSimulationParameters.xlsx: an excel sheet with which MaCS parameters can be calculated

    9) Ne100.sh: example of what to put into RunMacs.sh (Ne100 population of Hickey et al., 2011 Genetics Selection Evolution)

    10) Ne1000.sh: example of what to put into RunMacs.sh (Ne1000 population of Hickey et al., 2011 Genetics Selection Evolution)FileS1.zipSimulated Data - Part 1Ten replicates of a livestock data structure were simulated. The structure was designed to cover a spectrum of QTL distributions, relationship structures, and SNP densities and to mimic some of the scenarios where genomic selection is ap...

  8. Importance of collecting selected behavioral data in marketing worldwide...

    • statista.com
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Importance of collecting selected behavioral data in marketing worldwide 2024 [Dataset]. https://www.statista.com/statistics/1470128/importance-collect-data-worldwide/
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2024
    Area covered
    World
    Description

    During a survey carried out among decision-makers in charge of customer engagement/retention strategy from 20 countries worldwide, 84 percent of respondents stated that they thought it was important or critical to collect customer channel engagement data; three in four named real-time experience in this context.

  9. d

    Echo Analytics | Market Analysis | Consumer Behavior Data |Europe |...

    • datarade.ai
    .csv, .xls, .xml
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Echo Analytics (2022). Echo Analytics | Market Analysis | Consumer Behavior Data |Europe | Available Globally | GDPR-Compliant [Dataset]. https://datarade.ai/data-categories/consumer-behavior-data/datasets
    Explore at:
    .csv, .xls, .xmlAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset authored and provided by
    Echo Analytics
    Area covered
    France, Germany, Belgium, Sweden, Italy, United Kingdom, Spain
    Description

    At Echo, our dedication to data curation is unmatched; we focus on providing our clients with an in-depth picture of a physical location based on activity in and around the point of interest (POI) over time. Our dataset empowers you to explore the cross-shopping patterns from your visitors by allowing you to dig deeper into consumer profiles, eliminate gaps in your trade area and discover untapped sites of action.

    This sample of our Market Analysis solution helps you determine the geographical reach of your store or facility based on the brands or categories most visited by consumers who visit your specific POI. This empowers your location strategy. This particular dataset is for Europe.

    Additional Information:

    • Understand the actual movement patterns of consumers without using PII data, gaining a 360-degree consumer view. Complement your online behavior knowledge with actual offline actions, and better attribute intent based on real-world behaviors.
    • Echo collects, cleans and updates its footfall on a daily basis. Normalization of the data occurs on a monthly basis.
    • We provide data aggregation on a weekly, monthly and quarterly basis.
    • Information about our country offering and data schema can be found here:

      1) Data Schema: https://docs.echo-analytics.com/activity/data-schema 2) Country Availability: https://docs.echo-analytics.com/activity/country-coverage 3) Methodology: https://docs.echo-analytics.com/activity/methodology

      Echo's commitment to customer service is evident in our exceptional data quality and dedicated team, providing 360° support throughout your location intelligence journey. We handle the complex tasks to deliver analysis-ready datasets to you.

    Business Needs: - Site Selection and Lease Renegotiation: Leverage foot traffic data for optimal site selection and advantageous lease renegotiations. This approach enables you to pinpoint ideal store locations and secure lease terms that align with business objectives, optimizing operational efficiency and cost-effectiveness.

    -Market Intelligence: Outsmart your competition by understanding competitor foot traffic trends, allowing you to identify growth opportunities and gain a competitive advantage. Analyze regional consumer behaviors and preferences to pinpoint new markets and assess the competitive landscape for strategic expansion.

  10. Data from: Benchmarking parametric and machine learning models for genomic...

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    csv, txt
    Updated Jun 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina B Azodi; Christina B Azodi; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger (2022). Benchmarking parametric and machine learning models for genomic prediction of complex traits [Dataset]. http://doi.org/10.5061/dryad.xksn02vb9
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    Jun 2, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christina B Azodi; Christina B Azodi; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (i.e. ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (i.e. feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.

  11. o

    Replication data for: Core Determining Class and Inequality Selection

    • openicpsr.org
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye Luo; Hai Wang (2017). Replication data for: Core Determining Class and Inequality Selection [Dataset]. http://doi.org/10.3886/E113508V1
    Explore at:
    Dataset updated
    May 1, 2017
    Dataset provided by
    American Economic Association
    Authors
    Ye Luo; Hai Wang
    Description

    The relations between unobserved events and observed outcomes can be characterized by a bipartite graph. We propose an algorithm that explores the structure of the graph to construct the "exact Core Determining Class," i.e., the set of irredudant inequalities. We prove that in general the exact Core Determining Class does not depend on the probability measure of the outcomes but only on the structure of the graph. For more general linear inequalities selection problems, we propose a statistical procedure similar to the Dantzig Selector to select the truly informative constraints. We demonstrate performances of our procedures in Monte-Carlo experiments.

  12. Data period selection for the EU ETS and China’s carbon trading pilots

    • data.subak.org
    xls
    Updated Feb 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Figshare (2023). Data period selection for the EU ETS and China’s carbon trading pilots [Dataset]. http://doi.org/10.1371/journal.pone.0238033.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 15, 2023
    Dataset provided by
    figshare
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Data period selection for the EU ETS and China’s carbon trading pilots.

  13. f

    Data from: A Unified Approach to Variable Selection for Partially Linear...

    • tandf.figshare.com
    zip
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youhan Lu; Yushen Dong; Juan Hu; Yichao Wu (2024). A Unified Approach to Variable Selection for Partially Linear Models [Dataset]. http://doi.org/10.6084/m9.figshare.23064566.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Youhan Lu; Yushen Dong; Juan Hu; Yichao Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We focus on the general partially linear model without any structure assumption on the nonparametric component. For such a model with both linear and nonlinear predictors being multivariate, we propose a new variable selection method. Our new method is a unified approach in the sense that it can select both linear and nonlinear predictors simultaneously by solving a single optimization problem. We prove that the proposed method achieves consistency. Both simulation examples and a real data example are used to demonstrate the new method’s competitive finite-sample performance. Supplementary materials for this article are available online.

  14. d

    POI Data | 230M+ Business Locations, Geographic & Places Insights

    • datarade.ai
    .json
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xverum (2023). POI Data | 230M+ Business Locations, Geographic & Places Insights [Dataset]. https://datarade.ai/data-categories/places-data/datasets
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Nov 14, 2023
    Dataset authored and provided by
    Xverum
    Area covered
    Vanuatu, Central African Republic, Saint Kitts and Nevis, Estonia, Ecuador, Israel, French Southern Territories, Qatar, American Samoa, Angola
    Description

    Xverum’s Point of Interest (POI) Data is a comprehensive dataset of 230M+ verified locations, covering businesses, commercial properties, and public places across 5000+ industry categories. Our dataset enables retailers, investors, and GIS professionals to make data-driven decisions for business expansion, location intelligence, and geographic analysis.

    With regular updates and continuous POI discovery, Xverum ensures your mapping and business location models have the latest data on business openings, closures, and geographic trends. Delivered in bulk via S3 Bucket or cloud storage, our dataset integrates seamlessly into geospatial analysis, market research, and navigation platforms.

    🔥 Key Features:

    📌 Comprehensive POI Coverage ✅ 230M+ global business & location data points, spanning 5000+ industry categories. ✅ Covers retail stores, corporate offices, hospitality venues, service providers & public spaces.

    🌍 Geographic & Business Location Insights ✅ Latitude & longitude coordinates for accurate mapping & navigation. ✅ Country, state, city, and postal code classifications. ✅ Business status tracking – Open, temporarily closed, permanently closed.

    🆕 Continuous Discovery & Regular Updates ✅ New business locations & POIs added continuously. ✅ Regular updates to reflect business openings, closures & relocations.

    📊 Rich Business & Location Data ✅ Company name, industry classification & category insights. ✅ Contact details, including phone number & website (if available). ✅ Consumer review insights, including rating distribution (optional feature).

    📍 Optimized for Business & Geographic Analysis ✅ Supports GIS, navigation systems & real estate site selection. ✅ Enhances location-based marketing & competitive analysis. ✅ Enables data-driven decision-making for business expansion & urban planning.

    🔐 Bulk Data Delivery (NO API) ✅ Delivered in bulk via S3 Bucket or cloud storage. ✅ Available in structured formats (.csv, .json, .xml) for seamless integration.

    🏆 Primary Use Cases:

    📈 Business Expansion & Market Research 🔹 Identify key business locations & competitors for strategic growth. 🔹 Assess market saturation & regional industry presence.

    📊 Geographic Intelligence & Mapping Solutions 🔹 Enhance GIS platforms & navigation systems with precise POI data. 🔹 Support smart city & infrastructure planning with location insights.

    🏪 Retail Site Selection & Consumer Insights 🔹 Analyze high-traffic locations for new store placements. 🔹 Understand customer behavior through business density & POI patterns.

    🌍 Location-Based Advertising & Geospatial Analytics 🔹 Improve targeted marketing with location-based insights. 🔹 Leverage geographic data for precision advertising & customer segmentation.

    💡 Why Choose Xverum’s POI Data? - 230M+ Verified POI Records – One of the largest & most structured business location datasets available. - Global Coverage – Spanning 249+ countries, covering all major business categories. - Regular Updates & New POI Discoveries – Ensuring accuracy. - Comprehensive Geographic & Business Data – Coordinates, industry classifications & category insights. - Bulk Dataset Delivery (NO API) – Direct access via S3 Bucket or cloud storage. - 100% GDPR & CCPA-Compliant – Ethically sourced & legally compliant.

    Access Xverum’s 230M+ POI Data for business location intelligence, geographic analysis & market research. Request a free sample or contact us to customize your dataset today!

  15. Willingness to share selected personal data with insurance providers U.S....

    • statista.com
    Updated Mar 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Willingness to share selected personal data with insurance providers U.S. 2019 [Dataset]. https://www.statista.com/statistics/1184447/willingness-share-data-with-insurance-providers-type-us/
    Explore at:
    Dataset updated
    Mar 7, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2019
    Area covered
    United States
    Description

    Most U.S. consumers are open to sharing information with insurance providers, although a 2019 survey finds that this willingness quickly decreases the more personal the information becomes. According to the survey, around two-thirds of consumers would be willing to share driving and claims history. However, just 31 percent of respondents are willing to share social media information, and only 28 percent are comfortable sharing mobile phone data.

  16. Smart Home apps collecting selected types of data points 2024

    • statista.com
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Smart Home apps collecting selected types of data points 2024 [Dataset]. https://www.statista.com/statistics/1552490/data-collection-smart-homes/
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    During the second quarter of 2024, the largest number of Smart Home mobile applications examined reported crash data to their publishers. Overall, 325 mobile apps in this category collected crash reports for functioning analytics. Approximately 294 apps collected e-mail addresses, while 286 collected product interaction data from their users. Smart Home applications can have several functions, such regulating homes' thermostats to operating motion sensors and pet cameras.

  17. d

    Data from: Estimating uncertainty in multivariate responses to selection

    • datadryad.org
    • search.dataone.org
    • +2more
    zip
    Updated Nov 15, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Estimating uncertainty in multivariate responses to selection [Dataset]. https://datadryad.org/stash/dataset/doi:10.5061/dryad.384nf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 15, 2013
    Dataset provided by
    Dryad
    Authors
    John R. Stinchcombe; Anna K. Simonsen; Mark W. Blows; Mark. W. Blows
    Time period covered
    2013
    Area covered
    Koffler Scientific Reserve
    Description

    Phenotypic data on flowering time, size, and relative fitnessSee read me file.Dryad_control_data.txt

  18. d

    Data from: Artificial selection reveals heritable variation for...

    • datadryad.org
    • zenodo.org
    zip
    Updated Jun 16, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashley J.R. Carter; David Houle (2011). Artificial selection reveals heritable variation for developmental instability [Dataset]. http://doi.org/10.5061/dryad.dt3s7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2011
    Dataset provided by
    Dryad
    Authors
    Ashley J.R. Carter; David Houle
    Time period covered
    2011
    Description

    carter-houle-evol2011-descriptionThis file contains descriptions of data column headings in other files. It is attached to each other file as a readme.carter-houle-evol2011-U1Data for the U1 line as described in the paper.carter-houle-evol2011-U2Data for the U2 line as described in the paper.carter-houle-evol2011-D1Data for the D1 line as described in the paper.carter-houle-evol2011-D2Data for the D2 line as described in the paper.

  19. d

    Data sets for variable selection and relation analysis - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Data sets for variable selection and relation analysis - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/b8822082-0b81-5f64-841c-b85bcc8c12b5
    Explore at:
    Dataset updated
    Oct 24, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data sets are used in the linked publication proposing the two novel approaches mutual forest impact (MFI) and mutual impurity reduction (MIR). Simulation study 1 was conducted to analyze the bias of importance and relation measures and contains two null scenarios with increasing number of expression possibilities (A) and with increasing minor allele frequencies (B). For each scenario, a classification, regression and survival outcome was simulated. The data contains scripts for simulation and the simulated data. Simulation study 2 was conducted to analyze the selection of variables in the presence of correlations. The data contains scripts for simulation and the simulated data. Simulation study 3 was conducted for the comparison of the feature selection approaches under realistic correlation structures. It is based on a realistic covariance matrix (mvn.RData) generated from an RNA-microarray dataset of breast cancer patients with 12,592 genes obtained from The Cancer Genome Atlas. The data contains only scripts for simulation. The data of the real data application is published in two csv files: "vcf.csv" contains the SNP data of the subset of the plastid genome data set of Solanum Section Petota species (Huang et al., 2019) in a variant calling format file. For this, multiple sequence alignments of 43 genes were conducted with QIAGEN CLC Genomics Workbench 22.0.2 (digitalinsights.qiagen.com) and SNP-sites was subsequently used to generate variant call format (VCF) files. These files were merged into a file of 257 SNPs for further analysis. "vcf_input_withCountry.csv" contains the same data but with the additional country category. Also the data is in a ready to use format for further analysis.

  20. d

    Data for: PickMe: Sample selection for species tree reconstruction using...

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Rusinko; Yu Cai; Allison Crysler; Katherine Thompson; Julien Boutte; Mark Fishbein; Shannon Straub (2024). Data for: PickMe: Sample selection for species tree reconstruction using coalescent weighted quartets [Dataset]. http://doi.org/10.5061/dryad.3r2280ggv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 14, 2024
    Dataset provided by
    Dryad
    Authors
    Joseph Rusinko; Yu Cai; Allison Crysler; Katherine Thompson; Julien Boutte; Mark Fishbein; Shannon Straub
    Time period covered
    2021
    Description

    Data for: PickMe: sample selection for species tree reconstruction using coalescent weighted quartets

    https://doi.org/10.5061/dryad.3r2280ggv

    Description of the data and file structure

    Data was collected for the analysis of the evolutionary relationships among milkweeds. The remaining data was used to test the PickMe algorithm for sample selection in the context of phylogenomic analysis.

    Data Descriptions

    - Milkweed-Sequence-Files.zip: Contains sequence data for the analysis. By the time of publication, all sequences will be referenced on GenBank.

    - estimated-gene-trees-NJ-Uncorrected and **estimated-gene-trees-RAxML ** estimated-gene-trees-NJ-Uncorrected: Contain all estimated Milkweed gene trees as described in the associated article. Sample names were cleaned up for the main manuscript. A log for matching is listed in a text file.

    - OldSpeciesTree.cf.tree: The species tree referenced in the paper, based ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nasa.gov (2025). A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/a-local-asynchronous-distributed-privacy-preserving-feature-selection-algorithm-for-large-
Organization logo

Data from: A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

Related Article
Explore at:
Dataset updated
Feb 18, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description

In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data is located at a central location. However, it becomes extremely challenging to perform the same when the data is distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world datasets in order to performance of the proposed algorithm.

Search
Clear search
Close search
Google apps
Main menu