100+ datasets found

Data from: A Local Asynchronous Distributed Privacy Preserving Feature...
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
+1more
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/a-local-asynchronous-distributed-privacy-preserving-feature-selection-algorithm-for-large-
Explore at:
Dataset updated
Feb 18, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data is located at a central location. However, it becomes extremely challenging to perform the same when the data is distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world datasets in order to performance of the proposed algorithm.
d
Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012
catalog.data.gov
data.cityofchicago.org
+2more
Updated Jan 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2024). Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012 [Dataset]. https://catalog.data.gov/dataset/census-data-selected-socioeconomic-indicators-in-chicago-2008-2012
Explore at:
Dataset updated
Jan 12, 2024
Dataset provided by
data.cityofchicago.org
Area covered
Chicago
Description
This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” by Chicago community area, for the years 2008 – 2012. The indicators are the percent of occupied housing units with more than one person per room (i.e., crowded housing); the percent of households living below the federal poverty level; the percent of persons in the labor force over the age of 16 years that are unemployed; the percent of persons over the age of 25 years without a high school diploma; the percent of the population under 18 or over 64 years of age (i.e., dependency); and per capita income. Indicators for Chicago as a whole are provided in the final row of the table. See the full dataset description for more information at: https://data.cityofchicago.org/api/views/fwb8-6aw5/files/A5KBlegGR2nWI1jgP6pjJl32CTPwPbkl9KU3FxlZk-A?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\ECONOMIC_INDICATORS\Dataset_Description_socioeconomic_indicators_2012_FOR_PORTAL_ONLY.pdf
Firm's selection of data services on Kubernetes environments worldwide 2024
statista.com
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Firm's selection of data services on Kubernetes environments worldwide 2024 [Dataset]. https://www.statista.com/statistics/1480261/data-service-of-choice-on-kubernetes-environment/
Explore at:
Dataset updated
Aug 26, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
As of 2024, around 72 percent of organizations chose databases (NoSQL, SQL etc.) on Kubernetes environments. Additionally, 67 percent of organizations utilized analytics (Data processing/ELT/ETL).
d
Recruitment and Selection Activity Year End Report
catalog.data.gov
data.montgomerycountymd.gov
+1more
Updated Apr 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.montgomerycountymd.gov (2023). Recruitment and Selection Activity Year End Report [Dataset]. https://catalog.data.gov/dataset/recruitment-and-selection-activity-year-end-report
Explore at:
Dataset updated
Apr 8, 2023
Dataset provided by
data.montgomerycountymd.gov
Description
The information in the dataset provides information on the MCG Recruitment and Selection Activities which includes the volume of applications received for each job vacancy, number of applicants hired, applicant statuses and the type of hires (Permanent, Temporary, Rehire) for the respective fiscal year. Update Frequency : Annually
d
Binary response panel data models with sample selection and self‐selection...
b2find.dkrz.de
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Binary response panel data models with sample selection and self‐selection (replication data) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/9d89ff8f-4cf1-5fef-883d-3821134c99cd
Explore at:
Dataset updated
Oct 24, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We consider estimating binary response models on an unbalanced panel, where the outcome of the dependent variable may be missing due to nonrandom selection, or there is self-selection into a treatment. In the present paper, we first consider estimation of sample selection models and treatment effects using a fully parametric approach, where the error distribution is assumed to be normal in both primary and selection equations. Arbitrary time dependence in errors is permitted. Estimation of both coefficients and partial effects, as well as tests for selection bias, are discussed. Furthermore, we consider a semiparametric estimator of binary response panel data models with sample selection that is robust to a variety of error distributions. The estimator employs a control function approach to account for endogenous selection and permits consistent estimation of scaled coefficients and relative effects.
d
Data from: Evaluating presence-only species distribution models with...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Aug 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan L. Warren; Nicholas Matzke; Teresa Iglesias (2020). Evaluating presence-only species distribution models with discrimination accuracy is uninformative for many applications [Dataset]. http://doi.org/10.5061/dryad.6ft55k9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.6ft55k9
Dataset updated
Aug 21, 2020
Dataset provided by
Dryad
Authors
Dan L. Warren; Nicholas Matzke; Teresa Iglesias
Time period covered
2020
Area covered
Australia
Description
Simulation code for Warren et al. 2019 - Journal of BiogeographySimulation code to accompany Warren et al. 2019, examining the relationship between discrimination accuracy and functional accuracy for ENM/SDM studiessim-code-Warren-et-al-2019-master.zip
d
Data from: Simulated data for genomic selection and genome-wide association...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Jul 25, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John M. Hickey; Gregor Gorjanc (2014). Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods [Dataset]. http://doi.org/10.5061/dryad.nm290
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.nm290
Dataset updated
Jul 25, 2014
Dataset provided by
Dryad
Authors
John M. Hickey; Gregor Gorjanc
Time period covered
2014
Description
File S11) AlphaDrop: executable for Linux

2) macs: MaCS executable for linux

3) msformatter: MaCS executable for linux

4) Seed.txt: a file containing a random seed for initialising AlphaDrop

5) RunMacs.sh: a shell script called by AlphaDrop when it runs MaCS

6) AlphaDropSpec.txt: the specification file for AlphaDrop

7) Pedigree.txt: an example externally supplied pedigree file

8) MaCsSimulationParameters.xlsx: an excel sheet with which MaCS parameters can be calculated

9) Ne100.sh: example of what to put into RunMacs.sh (Ne100 population of Hickey et al., 2011 Genetics Selection Evolution)

10) Ne1000.sh: example of what to put into RunMacs.sh (Ne1000 population of Hickey et al., 2011 Genetics Selection Evolution)FileS1.zipSimulated Data - Part 1Ten replicates of a livestock data structure were simulated. The structure was designed to cover a spectrum of QTL distributions, relationship structures, and SNP densities and to mimic some of the scenarios where genomic selection is ap...
Importance of collecting selected behavioral data in marketing worldwide...
statista.com
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Importance of collecting selected behavioral data in marketing worldwide 2024 [Dataset]. https://www.statista.com/statistics/1470128/importance-collect-data-worldwide/
Explore at:
Dataset updated
Jun 3, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2024
Area covered
World
Description
During a survey carried out among decision-makers in charge of customer engagement/retention strategy from 20 countries worldwide, 84 percent of respondents stated that they thought it was important or critical to collect customer channel engagement data; three in four named real-time experience in this context.
d
Echo Analytics | Market Analysis | Consumer Behavior Data |Europe |...
datarade.ai
.csv, .xls, .xml
Updated Oct 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Echo Analytics (2022). Echo Analytics | Market Analysis | Consumer Behavior Data |Europe | Available Globally | GDPR-Compliant [Dataset]. https://datarade.ai/data-categories/consumer-behavior-data/datasets
Explore at:
.csv, .xls, .xmlAvailable download formats
Dataset updated
Oct 27, 2022
Dataset authored and provided by
Echo Analytics
Area covered
France, Germany, Belgium, Sweden, Italy, United Kingdom, Spain
Description
At Echo, our dedication to data curation is unmatched; we focus on providing our clients with an in-depth picture of a physical location based on activity in and around the point of interest (POI) over time. Our dataset empowers you to explore the cross-shopping patterns from your visitors by allowing you to dig deeper into consumer profiles, eliminate gaps in your trade area and discover untapped sites of action.

This sample of our Market Analysis solution helps you determine the geographical reach of your store or facility based on the brands or categories most visited by consumers who visit your specific POI. This empowers your location strategy. This particular dataset is for Europe.

Additional Information:

Understand the actual movement patterns of consumers without using PII data, gaining a 360-degree consumer view. Complement your online behavior knowledge with actual offline actions, and better attribute intent based on real-world behaviors.

Echo collects, cleans and updates its footfall on a daily basis. Normalization of the data occurs on a monthly basis.

We provide data aggregation on a weekly, monthly and quarterly basis.

Information about our country offering and data schema can be found here:

1) Data Schema: https://docs.echo-analytics.com/activity/data-schema 2) Country Availability: https://docs.echo-analytics.com/activity/country-coverage 3) Methodology: https://docs.echo-analytics.com/activity/methodology

Echo's commitment to customer service is evident in our exceptional data quality and dedicated team, providing 360° support throughout your location intelligence journey. We handle the complex tasks to deliver analysis-ready datasets to you.

Business Needs: - Site Selection and Lease Renegotiation: Leverage foot traffic data for optimal site selection and advantageous lease renegotiations. This approach enables you to pinpoint ideal store locations and secure lease terms that align with business objectives, optimizing operational efficiency and cost-effectiveness.

-Market Intelligence: Outsmart your competition by understanding competitor foot traffic trends, allowing you to identify growth opportunities and gain a competitive advantage. Analyze regional consumer behaviors and preferences to pinpoint new markets and assess the competitive landscape for strategic expansion.
Data from: Benchmarking parametric and machine learning models for genomic...
zenodo.org
data.niaid.nih.gov
+2more
csv, txt
Updated Jun 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina B Azodi; Christina B Azodi; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger (2022). Benchmarking parametric and machine learning models for genomic prediction of complex traits [Dataset]. http://doi.org/10.5061/dryad.xksn02vb9
Explore at:
csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.xksn02vb9
Dataset updated
Jun 2, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christina B Azodi; Christina B Azodi; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger; Andrew McCarren; Mark Roantree; Gustavo de los Campos; Shin-Han Shiu; Emily Bolger
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (i.e. ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (i.e. feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.
o
Replication data for: Core Determining Class and Inequality Selection
openicpsr.org
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ye Luo; Hai Wang (2017). Replication data for: Core Determining Class and Inequality Selection [Dataset]. http://doi.org/10.3886/E113508V1
Explore at:
Unique identifier
https://doi.org/10.3886/E113508V1
Dataset updated
May 1, 2017
Dataset provided by
American Economic Association
Authors
Ye Luo; Hai Wang
Description
The relations between unobserved events and observed outcomes can be characterized by a bipartite graph. We propose an algorithm that explores the structure of the graph to construct the "exact Core Determining Class," i.e., the set of irredudant inequalities. We prove that in general the exact Core Determining Class does not depend on the probability measure of the outcomes but only on the structure of the graph. For more general linear inequalities selection problems, we propose a statistical procedure similar to the Dantzig Selector to select the truly informative constraints. We demonstrate performances of our procedures in Monte-Carlo experiments.
Data period selection for the EU ETS and China’s carbon trading pilots
data.subak.org
xls
Updated Feb 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Figshare (2023). Data period selection for the EU ETS and China’s carbon trading pilots [Dataset]. http://doi.org/10.1371/journal.pone.0238033.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0238033.t002
Dataset updated
Feb 15, 2023
Dataset provided by
figshare
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
Data period selection for the EU ETS and China’s carbon trading pilots.
f
Data from: A Unified Approach to Variable Selection for Partially Linear...
tandf.figshare.com
zip
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youhan Lu; Yushen Dong; Juan Hu; Yichao Wu (2024). A Unified Approach to Variable Selection for Partially Linear Models [Dataset]. http://doi.org/10.6084/m9.figshare.23064566.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23064566.v1
Dataset updated
Mar 4, 2024
Dataset provided by
Taylor & Francis
Authors
Youhan Lu; Yushen Dong; Juan Hu; Yichao Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We focus on the general partially linear model without any structure assumption on the nonparametric component. For such a model with both linear and nonlinear predictors being multivariate, we propose a new variable selection method. Our new method is a unified approach in the sense that it can select both linear and nonlinear predictors simultaneously by solving a single optimization problem. We prove that the proposed method achieves consistency. Both simulation examples and a real data example are used to demonstrate the new method’s competitive finite-sample performance. Supplementary materials for this article are available online.
d
POI Data | 230M+ Business Locations, Geographic & Places Insights
datarade.ai
.json
Updated Nov 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xverum (2023). POI Data | 230M+ Business Locations, Geographic & Places Insights [Dataset]. https://datarade.ai/data-categories/places-data/datasets
Explore at:
.jsonAvailable download formats
Dataset updated
Nov 14, 2023
Dataset authored and provided by
Xverum
Area covered
Vanuatu, Central African Republic, Saint Kitts and Nevis, Estonia, Ecuador, Israel, French Southern Territories, Qatar, American Samoa, Angola
Description
Xverum’s Point of Interest (POI) Data is a comprehensive dataset of 230M+ verified locations, covering businesses, commercial properties, and public places across 5000+ industry categories. Our dataset enables retailers, investors, and GIS professionals to make data-driven decisions for business expansion, location intelligence, and geographic analysis.

With regular updates and continuous POI discovery, Xverum ensures your mapping and business location models have the latest data on business openings, closures, and geographic trends. Delivered in bulk via S3 Bucket or cloud storage, our dataset integrates seamlessly into geospatial analysis, market research, and navigation platforms.

🔥 Key Features:

📌 Comprehensive POI Coverage ✅ 230M+ global business & location data points, spanning 5000+ industry categories. ✅ Covers retail stores, corporate offices, hospitality venues, service providers & public spaces.

🌍 Geographic & Business Location Insights ✅ Latitude & longitude coordinates for accurate mapping & navigation. ✅ Country, state, city, and postal code classifications. ✅ Business status tracking – Open, temporarily closed, permanently closed.

🆕 Continuous Discovery & Regular Updates ✅ New business locations & POIs added continuously. ✅ Regular updates to reflect business openings, closures & relocations.

📊 Rich Business & Location Data ✅ Company name, industry classification & category insights. ✅ Contact details, including phone number & website (if available). ✅ Consumer review insights, including rating distribution (optional feature).

📍 Optimized for Business & Geographic Analysis ✅ Supports GIS, navigation systems & real estate site selection. ✅ Enhances location-based marketing & competitive analysis. ✅ Enables data-driven decision-making for business expansion & urban planning.

🔐 Bulk Data Delivery (NO API) ✅ Delivered in bulk via S3 Bucket or cloud storage. ✅ Available in structured formats (.csv, .json, .xml) for seamless integration.

🏆 Primary Use Cases:

📈 Business Expansion & Market Research 🔹 Identify key business locations & competitors for strategic growth. 🔹 Assess market saturation & regional industry presence.

📊 Geographic Intelligence & Mapping Solutions 🔹 Enhance GIS platforms & navigation systems with precise POI data. 🔹 Support smart city & infrastructure planning with location insights.

🏪 Retail Site Selection & Consumer Insights 🔹 Analyze high-traffic locations for new store placements. 🔹 Understand customer behavior through business density & POI patterns.

🌍 Location-Based Advertising & Geospatial Analytics 🔹 Improve targeted marketing with location-based insights. 🔹 Leverage geographic data for precision advertising & customer segmentation.

💡 Why Choose Xverum’s POI Data? - 230M+ Verified POI Records – One of the largest & most structured business location datasets available. - Global Coverage – Spanning 249+ countries, covering all major business categories. - Regular Updates & New POI Discoveries – Ensuring accuracy. - Comprehensive Geographic & Business Data – Coordinates, industry classifications & category insights. - Bulk Dataset Delivery (NO API) – Direct access via S3 Bucket or cloud storage. - 100% GDPR & CCPA-Compliant – Ethically sourced & legally compliant.

Access Xverum’s 230M+ POI Data for business location intelligence, geographic analysis & market research. Request a free sample or contact us to customize your dataset today!
Willingness to share selected personal data with insurance providers U.S....
statista.com
Updated Mar 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Willingness to share selected personal data with insurance providers U.S. 2019 [Dataset]. https://www.statista.com/statistics/1184447/willingness-share-data-with-insurance-providers-type-us/
Explore at:
Dataset updated
Mar 7, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2019
Area covered
United States
Description
Most U.S. consumers are open to sharing information with insurance providers, although a 2019 survey finds that this willingness quickly decreases the more personal the information becomes. According to the survey, around two-thirds of consumers would be willing to share driving and claims history. However, just 31 percent of respondents are willing to share social media information, and only 28 percent are comfortable sharing mobile phone data.
Smart Home apps collecting selected types of data points 2024
statista.com
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Smart Home apps collecting selected types of data points 2024 [Dataset]. https://www.statista.com/statistics/1552490/data-collection-smart-homes/
Explore at:
Dataset updated
Feb 3, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the second quarter of 2024, the largest number of Smart Home mobile applications examined reported crash data to their publishers. Overall, 325 mobile apps in this category collected crash reports for functioning analytics. Approximately 294 apps collected e-mail addresses, while 286 collected product interaction data from their users. Smart Home applications can have several functions, such regulating homes' thermostats to operating motion sensors and pet cameras.
d
Data from: Estimating uncertainty in multivariate responses to selection
datadryad.org
search.dataone.org
+2more
zip
Updated Nov 15, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Estimating uncertainty in multivariate responses to selection [Dataset]. https://datadryad.org/stash/dataset/doi:10.5061/dryad.384nf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.384nf
Dataset updated
Nov 15, 2013
Dataset provided by
Dryad
Authors
John R. Stinchcombe; Anna K. Simonsen; Mark W. Blows; Mark. W. Blows
Time period covered
2013
Area covered
Koffler Scientific Reserve
Description
Phenotypic data on flowering time, size, and relative fitnessSee read me file.Dryad_control_data.txt
d
Data from: Artificial selection reveals heritable variation for...
datadryad.org
zenodo.org
zip
Updated Jun 16, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashley J.R. Carter; David Houle (2011). Artificial selection reveals heritable variation for developmental instability [Dataset]. http://doi.org/10.5061/dryad.dt3s7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.dt3s7
Dataset updated
Jun 16, 2011
Dataset provided by
Dryad
Authors
Ashley J.R. Carter; David Houle
Time period covered
2011
Description
carter-houle-evol2011-descriptionThis file contains descriptions of data column headings in other files. It is attached to each other file as a readme.carter-houle-evol2011-U1Data for the U1 line as described in the paper.carter-houle-evol2011-U2Data for the U2 line as described in the paper.carter-houle-evol2011-D1Data for the D1 line as described in the paper.carter-houle-evol2011-D2Data for the D2 line as described in the paper.
d
Data sets for variable selection and relation analysis - Dataset - B2FIND
b2find.dkrz.de
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Data sets for variable selection and relation analysis - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/b8822082-0b81-5f64-841c-b85bcc8c12b5
Explore at:
Dataset updated
Oct 24, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data sets are used in the linked publication proposing the two novel approaches mutual forest impact (MFI) and mutual impurity reduction (MIR). Simulation study 1 was conducted to analyze the bias of importance and relation measures and contains two null scenarios with increasing number of expression possibilities (A) and with increasing minor allele frequencies (B). For each scenario, a classification, regression and survival outcome was simulated. The data contains scripts for simulation and the simulated data. Simulation study 2 was conducted to analyze the selection of variables in the presence of correlations. The data contains scripts for simulation and the simulated data. Simulation study 3 was conducted for the comparison of the feature selection approaches under realistic correlation structures. It is based on a realistic covariance matrix (mvn.RData) generated from an RNA-microarray dataset of breast cancer patients with 12,592 genes obtained from The Cancer Genome Atlas. The data contains only scripts for simulation. The data of the real data application is published in two csv files: "vcf.csv" contains the SNP data of the subset of the plastid genome data set of Solanum Section Petota species (Huang et al., 2019) in a variant calling format file. For this, multiple sequence alignments of 43 genes were conducted with QIAGEN CLC Genomics Workbench 22.0.2 (digitalinsights.qiagen.com) and SNP-sites was subsequently used to generate variant call format (VCF) files. These files were merged into a file of 257 SNPs for further analysis. "vcf_input_withCountry.csv" contains the same data but with the additional country category. Also the data is in a ready to use format for further analysis.
d
Data for: PickMe: Sample selection for species tree reconstruction using...
datadryad.org
search.dataone.org
+1more
zip
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Rusinko; Yu Cai; Allison Crysler; Katherine Thompson; Julien Boutte; Mark Fishbein; Shannon Straub (2024). Data for: PickMe: Sample selection for species tree reconstruction using coalescent weighted quartets [Dataset]. http://doi.org/10.5061/dryad.3r2280ggv
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.3r2280ggv
Dataset updated
Oct 14, 2024
Dataset provided by
Dryad
Authors
Joseph Rusinko; Yu Cai; Allison Crysler; Katherine Thompson; Julien Boutte; Mark Fishbein; Shannon Straub
Time period covered
2021
Description
Data for: PickMe: sample selection for species tree reconstruction using coalescent weighted quartets

https://doi.org/10.5061/dryad.3r2280ggv

Description of the data and file structure

Data was collected for the analysis of the evolutionary relationships among milkweeds. The remaining data was used to test the PickMe algorithm for sample selection in the context of phylogenomic analysis.

Data Descriptions

- Milkweed-Sequence-Files.zip: Contains sequence data for the analysis. By the time of publication, all sequences will be referenced on GenBank.

- estimated-gene-trees-NJ-Uncorrected and **estimated-gene-trees-RAxML ** estimated-gene-trees-NJ-Uncorrected: Contain all estimated Milkweed gene trees as described in the associated article. Sample names were cleaned up for the main manuscript. A log for matching is listed in a text file.

- OldSpeciesTree.cf.tree: The species tree referenced in the paper, based ...

Facebook

Twitter

Click to copy link

Link copied

Cite

nasa.gov (2025). A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/a-local-asynchronous-distributed-privacy-preserving-feature-selection-algorithm-for-large-

Data from: A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

Explore at:

Dataset updated

Feb 18, 2025

Dataset provided by

NASAhttp://nasa.gov/

Description

In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used in machine learning for data compaction and efficient learning by eliminating the curse of dimensionality. There exist many solutions for feature selection when the data is located at a central location. However, it becomes extremely challenging to perform the same when the data is distributed across a large number of peers or machines. Centralizing the entire dataset or portions of it can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer-to-peer networks, dynamic nature of the data/network and privacy concerns. The solution proposed in this paper allows us to perform feature selection in an asynchronous fashion with a low communication overhead where each peer can specify its own privacy constraints. The algorithm works based on local interactions among participating nodes. We present results on real-world datasets in order to performance of the proposed algorithm.

Clear search

Close search

Google apps

Main menu

Data from: A Local Asynchronous Distributed Privacy Preserving Feature...

Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012

Firm's selection of data services on Kubernetes environments worldwide 2024

Recruitment and Selection Activity Year End Report

Binary response panel data models with sample selection and self‐selection...

Data from: Evaluating presence-only species distribution models with...

Data from: Simulated data for genomic selection and genome-wide association...

Importance of collecting selected behavioral data in marketing worldwide...

Echo Analytics | Market Analysis | Consumer Behavior Data |Europe |...

Data from: Benchmarking parametric and machine learning models for genomic...

Replication data for: Core Determining Class and Inequality Selection

Data period selection for the EU ETS and China’s carbon trading pilots

Data from: A Unified Approach to Variable Selection for Partially Linear...

POI Data | 230M+ Business Locations, Geographic & Places Insights

Willingness to share selected personal data with insurance providers U.S....

Smart Home apps collecting selected types of data points 2024

Data from: Estimating uncertainty in multivariate responses to selection

Data from: Artificial selection reveals heritable variation for...

Data sets for variable selection and relation analysis - Dataset - B2FIND

Data for: PickMe: Sample selection for species tree reconstruction using...

Data for: PickMe: sample selection for species tree reconstruction using coalescent weighted quartets

Description of the data and file structure

Data from: A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks