100+ datasets found
  1. MOVIE CORRELATION ANALYSIS-2ND PROJECT

    • kaggle.com
    zip
    Updated Oct 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    srijanrawat86 (2023). MOVIE CORRELATION ANALYSIS-2ND PROJECT [Dataset]. https://www.kaggle.com/datasets/srijanrawat86/movie-correlation-analysis-2nd-project
    Explore at:
    zip(433664 bytes)Available download formats
    Dataset updated
    Oct 8, 2023
    Authors
    srijanrawat86
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by srijanrawat86

    Released under CC0: Public Domain

    Contents

  2. Statistical analysis of co-occurrence patterns in microbial presence-absence...

    • plos.figshare.com
    html
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan (2023). Statistical analysis of co-occurrence patterns in microbial presence-absence datasets [Dataset]. http://doi.org/10.1371/journal.pone.0187132
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kumar P. Mainali; Sharon Bewick; Peter Thielen; Thomas Mehoke; Florian P. Breitwieser; Shishir Paudel; Arjun Adhikari; Joshua Wolfe; Eric V. Slud; David Karig; William F. Fagan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson’s correlation coefficient (r) and Jaccard’s index (J)–two of the most common metrics for correlation analysis of presence-absence data–can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (

  3. n

    Data from: WiBB: An integrated method for quantifying the relative...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    zip
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qin Li; Xiaojun Kou (2021). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 20, 2021
    Dataset provided by
    Beijing Normal University
    Field Museum of Natural History
    Authors
    Qin Li; Xiaojun Kou
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.

    A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.

    Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.

  4. A Bayesian method for detecting pairwise associations in compositional data

    • plos.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma Schwager; Himel Mallick; Steffen Ventz; Curtis Huttenhower (2023). A Bayesian method for detecting pairwise associations in compositional data [Dataset]. http://doi.org/10.1371/journal.pcbi.1005852
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Emma Schwager; Himel Mallick; Steffen Ventz; Curtis Huttenhower
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Compositional data consist of vectors of proportions normalized to a constant sum from a basis of unobserved counts. The sum constraint makes inference on correlations between unconstrained features challenging due to the information loss from normalization. However, such correlations are of long-standing interest in fields including ecology. We propose a novel Bayesian framework (BAnOCC: Bayesian Analysis of Compositional Covariance) to estimate a sparse precision matrix through a LASSO prior. The resulting posterior, generated by MCMC sampling, allows uncertainty quantification of any function of the precision matrix, including the correlation matrix. We also use a first-order Taylor expansion to approximate the transformation from the unobserved counts to the composition in order to investigate what characteristics of the unobserved counts can make the correlations more or less difficult to infer. On simulated datasets, we show that BAnOCC infers the true network as well as previous methods while offering the advantage of posterior inference. Larger and more realistic simulated datasets further showed that BAnOCC performs well as measured by type I and type II error rates. Finally, we apply BAnOCC to a microbial ecology dataset from the Human Microbiome Project, which in addition to reproducing established ecological results revealed unique, competition-based roles for Proteobacteria in multiple distinct habitats.

  5. Mental Health and Screen Time Correlation Dataset

    • kaggle.com
    zip
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaj Kotak (2024). Mental Health and Screen Time Correlation Dataset [Dataset]. https://www.kaggle.com/datasets/yajkotak/mental-health-and-screen-time-correlation-dataset
    Explore at:
    zip(191096 bytes)Available download formats
    Dataset updated
    Sep 10, 2024
    Authors
    Yaj Kotak
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Yaj Kotak

    Released under Apache 2.0

    Contents

  6. health-dataset-correlation

    • kaggle.com
    zip
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    atulpatra28 (2024). health-dataset-correlation [Dataset]. https://www.kaggle.com/datasets/atulpatra28/health-dataset-correlation
    Explore at:
    zip(5137 bytes)Available download formats
    Dataset updated
    Aug 31, 2024
    Authors
    atulpatra28
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by atulpatra28

    Released under Apache 2.0

    Contents

  7. Correlation_and_Hypothesis_Tests R Studio

    • kaggle.com
    zip
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). Correlation_and_Hypothesis_Tests R Studio [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/correlation-and-hypothesis-tests-r-studio
    Explore at:
    zip(55979 bytes)Available download formats
    Dataset updated
    Nov 26, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    cleaned and prepared dataset for performing introductory statistical analysis. Designed specifically for practicing Correlation Analysis and Hypothesis Testing in R Studio. Includes multiple variables suitable for comparing distributions and testing relationships. An ideal learning resource for students new to statistical computing and data science. Allows users to replicate standard statistical procedures (e.g., t-tests, Pearson correlation, ANOVA).

  8. f

    Correlation matrix of the used datasets.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 31, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    El-Shafie, Ahmed; Afan, Haitham Abdulmohsin; Kisi, Ozgur; Singh, Vijay P.; Mohd, Nuruol Syuhadaa; Karami, Hojat; Farzin, Saeed; Ferdowsi, Ahmad; Malek, M. A.; Lai, Sai Hin; Mousavi, Sayed Farhad; Ahmed, Ali Najah; Ehteram, Mohammad (2019). Correlation matrix of the used datasets. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000127150
    Explore at:
    Dataset updated
    May 31, 2019
    Authors
    El-Shafie, Ahmed; Afan, Haitham Abdulmohsin; Kisi, Ozgur; Singh, Vijay P.; Mohd, Nuruol Syuhadaa; Karami, Hojat; Farzin, Saeed; Ferdowsi, Ahmad; Malek, M. A.; Lai, Sai Hin; Mousavi, Sayed Farhad; Ahmed, Ali Najah; Ehteram, Mohammad
    Description

    Correlation matrix of the used datasets.

  9. Sample Correlation Data

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forrest Carlton (2023). Sample Correlation Data [Dataset]. https://www.kaggle.com/datasets/forrestcarlton1/sample-correlation-data/discussion
    Explore at:
    zip(1173 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    Forrest Carlton
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Forrest Carlton

    Released under Apache 2.0

    Contents

  10. d

    Data from: Example Groundwater-Level Datasets and Benchmarking Results for...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Example Groundwater-Level Datasets and Benchmarking Results for the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) Software Package [Dataset]. https://catalog.data.gov/dataset/example-groundwater-level-datasets-and-benchmarking-results-for-the-automated-regional-cor
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release provides two example groundwater-level datasets used to benchmark the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) software package (Levy and others, 2024). The first dataset contains groundwater-level records and site metadata for wells located on Long Island, New York (NY) and some surrounding mainland sites in New York and Connecticut. The second dataset contains groundwater-level records and site metadata for wells located in the southeastern San Joaquin Valley of the Central Valley, California (CA). For ease of exposition these are referred to as NY and CA datasets, respectively. Both datasets are formatted with column headers that can be read by the ARCHI software package within the R computing environment. These datasets were used to benchmark the imputation accuracy of three ARCHI model settings (OLS, ridge, and MOVE.1) against the widely used imputation program missForest (Stekhoven and Bühlmann, 2012). The ARCHI program was used to process the NY and CA datasets on monthly and annual timesteps, respectively, filter out sites with insufficient data for imputation, and create 200 test datasets from each of the example datasets with 5 percent of observations removed at random (herein, referred to as "holdouts"). Imputation accuracy for test datasets was assessed using normalized root mean square error (NRMSE), which is the root mean square error divided by the standard deviation of the observed holdout values. ARCHI produces prediction intervals (PIs) using a non-parametric bootstrapping routine, which were assessed by computing a coverage rate (CR) defined as the proportion of holdout observations falling within the estimated PI. The multiple regression models included with the ARCHI package (OLS and ridge) were further tested on all test datasets at eleven different levels of the p_per_n input parameter, which limits the maximum ratio of regression model predictors (p) per observations (n) as a decimal fraction greater than zero and less than or equal to one. This data release contains ten tables formatted as tab-delimited text files. The “CA_data.txt” and “NY_data.txt” tables contain 243,094 and 89,997 depth-to-groundwater measurement values (value, in feet below land surface) indexed by site identifier (site_no) and measurement date (date) for CA and NY datasets, respectively. The “CA_sites.txt” and “NY_sites.txt” tables contain site metadata for the 4,380 and 476 unique sites included in the CA and NY datasets, respectively. The “CA_NRMSE.txt” and “NY_NRMSE.txt” tables contain NRMSE values computed by imputing 200 test datasets with 5 percent random holdouts to assess imputation accuracy for three different ARCHI model settings and missForest using CA and NY datasets, respectively. The “CA_CR.txt” and “NY_CR.txt” tables contain CR values used to evaluate non-parametric PIs generated by bootstrapping regressions with three different ARCHI model settings using the CA and NY test datasets, respectively. The “CA_p_per_n.txt” and “NY_p_per_n.txt” tables contain mean NRMSE values computed for 200 test datasets with 5 percent random holdouts at 11 different levels of p_per_n for OLS and ridge models compared to training error for the same models on the entire CA and NY datasets, respectively. References Cited Levy, Z.F., Stagnitta, T.J., and Glas, R.L., 2024, ARCHI: Automated Regional Correlation Analysis for Hydrologic Record Imputation, v1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/P1VVHWKE. Stekhoven, D.J., and Bühlmann, P., 2012, MissForest—non-parametric missing value imputation for mixed-type data: Bioinformatics 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597.

  11. Partner expectations survey dataset

    • kaggle.com
    zip
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tharun (2024). Partner expectations survey dataset [Dataset]. https://www.kaggle.com/datasets/tharunprabu/partner-preference
    Explore at:
    zip(1328 bytes)Available download formats
    Dataset updated
    Mar 20, 2024
    Authors
    Tharun
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📊 Partner's Expectations Survey Dataset: Insights from Ongoing Exploration

    Embark on an insightful journey into relationship dynamics with this evolving dataset! With an initial set of small responses, the "Relationship Predictor Survey" offers a snapshot into the diverse factors influencing romantic connections. As I continue to collect data, this dataset will grow, providing a unique opportunity for ongoing analysis and exploration.

    Key Features:

    🌐 Ongoing Updates: I'm committed to regularly updating this dataset with new responses, expanding the scope of my exploration.

    🧩 Initial Insights: While currently modest, the dataset already encompasses a variety of factors, including social skills, personality traits, interests, and more.

    🚀 Community Collaboration: Join me in unraveling the nuances of relationships by contributing to and engaging with this evolving dataset.

    How to Use:

    📈 Track Changes: Stay tuned for updates as I will add more responses over time. 🤝 Collaborate: Share your own insights and analyses to enrich the collective understanding. 📑 Flexible Research: Use the dataset for ongoing research projects or personal exploration.

    Acknowledgments: A sincere thank you to the initial participants who kick started this project. Your input lays the foundation for a growing resource that benefits the community.

    Please help me gather more data: https://forms.gle/xJ7W6SRH917HLMsaA

    Join me in this continuous exploration of relationships. As I gather more responses, the dataset will become a dynamic resource for insights and discussions. Happy analyzing! 🌱

  12. f

    Data sets used for the correlation matrixes shown in Fig 6 as well as S10...

    • datasetcatalog.nlm.nih.gov
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wagner, Catherine E.; Ehrenfels, Benedikt; Sweke, Emmanuel A.; Dinkel, Christian; Kalangali, Anthony; Junker, Julian; Mbonde, Athanasio S.; Schubert, Carsten J.; Mosille, Julieth B.; Kimirei, Ismael A.; Callbeck, Cameron M.; Namutebi, Demmy; Seehausen, Ole; Wehrli, Bernhard (2023). Data sets used for the correlation matrixes shown in Fig 6 as well as S10 and S11 Figs. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000965261
    Explore at:
    Dataset updated
    Nov 8, 2023
    Authors
    Wagner, Catherine E.; Ehrenfels, Benedikt; Sweke, Emmanuel A.; Dinkel, Christian; Kalangali, Anthony; Junker, Julian; Mbonde, Athanasio S.; Schubert, Carsten J.; Mosille, Julieth B.; Kimirei, Ismael A.; Callbeck, Cameron M.; Namutebi, Demmy; Seehausen, Ole; Wehrli, Bernhard
    Description

    Of the chosen variables, chlorophyll-a and all POM-related parameters depict depth-integrated values. The δ13C, δ15N, and C:N values from all other food web members (except POM) represent average values from the respective sites. Gaps in the data set are highlighted in white. Gaps at the northern (station 1) or southern (station 9) extremities of the lake were filled by assuming the same value as from the neighbouring site. Other gaps were filled by calculating the average value between the two neighbouring sites. Rows (i.e. stations) used for calculating the correlation matrixes are highlighted in bold black font. (XLSX)

  13. Evaluating Correlation Between Measurement Samples in Reverberation Chambers...

    • nist.gov
    • datasets.ai
    • +2more
    Updated Apr 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). Evaluating Correlation Between Measurement Samples in Reverberation Chambers Using Clustering [Dataset]. http://doi.org/10.18434/mds2-2986
    Explore at:
    Dataset updated
    Apr 6, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    Evaluating Correlation Between Measurement Samples in Reverberation Chambers Using Clustering Abstract: Traditionally, in reverberation chambers (RC) measurement autocorrelation or correlation-matrix methods have been applied to evaluate measurement correlation. In this article, we introduce the use of clustering based on correlative distance to group correlated measurements. We apply the method to measurements taken in an RC using one and two paddles to stir the electromagnetic fields and applying decreasing angular steps between consecutive paddles positions. The results using varying correlation threshold values demonstrate that the method calculates the number of effective samples and allows discerning outliers, i.e., uncorrelated measurements, and clusters of correlated measurements. This calculation method, if verified, will allow non-sequential stir sequence design and, thereby, reduce testing time. Keywords: Correlation, Pearson correlation coefficient (PCC), reverberation chambers (RC), mode-stirring samples, correlative distance, clustering analysis, adjacency matrix.

  14. Correlation matrix between factor scores based on principal component...

    • plos.figshare.com
    xls
    Updated Sep 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noori Akhtar-Danesh (2023). Correlation matrix between factor scores based on principal component extraction and different rotation techniques. [Dataset]. http://doi.org/10.1371/journal.pone.0290728.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Noori Akhtar-Danesh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Correlation matrix between factor scores based on principal component extraction and different rotation techniques.

  15. Pearson correlation among annotators.

    • plos.figshare.com
    xls
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marzieh Babaali; Afsaneh Fatemi; Mohammad Ali Nematbakhsh (2024). Pearson correlation among annotators. [Dataset]. http://doi.org/10.1371/journal.pone.0301696.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Marzieh Babaali; Afsaneh Fatemi; Mohammad Ali Nematbakhsh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the domain of question subjectivity classification, there exists a need for detailed datasets that can foster advancements in Automatic Subjective Question Answering (ASQA) systems. Addressing the prevailing research gaps, this paper introduces the Fine-Grained Question Subjectivity Dataset (FQSD), which comprises 10,000 questions. The dataset distinguishes between subjective and objective questions and offers additional categorizations such as Subjective-types (Target, Attitude, Reason, Yes/No, None) and Comparison-form (Single, Comparative). Annotation reliability was confirmed via robust evaluation techniques, yielding a Fleiss’s Kappa score of 0.76 and Pearson correlation values up to 0.80 among three annotators. We benchmarked FQSD against existing datasets such as (Yu, Zha, and Chua 2012), SubjQA (Bjerva 2020), and ConvEx-DS (Hernandez-Bocanegra 2021). Our dataset excelled in scale, linguistic diversity, and syntactic complexity, establishing a new standard for future research. We employed visual methodologies to provide a nuanced understanding of the dataset and its classes. Utilizing transformer-based models like BERT, XLNET, and RoBERTa for validation, RoBERTa achieved an outstanding F1-score of 97%, confirming the dataset’s efficacy for the advanced subjectivity classification task. Furthermore, we utilized Local Interpretable Model-agnostic Explanations (LIME) to elucidate model decision-making, ensuring transparent and reliable model predictions in subjectivity classification tasks.

  16. Correlation of network seed genes in each of the four expression datasets of...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeanne M. Serb; Megan C. Orr; M. Heather West Greenlee (2023). Correlation of network seed genes in each of the four expression datasets of mouse. [Dataset]. http://doi.org/10.1371/journal.pone.0012525.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jeanne M. Serb; Megan C. Orr; M. Heather West Greenlee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The mouse expression datasets are: I [20]; II [21]; III [22], IV [23]. Numbers in parentheses are the positive or negative correlation coefficient of seed genes in each mouse datasets. “-” indicates that the seed gene is present in the dataset, but is not correlated with other seed genes. “NA” indicates that the seed gene is not present in the dataset.

  17. Overall scoring of methods for the Datasets (one plus equals one point; the...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kornel Chrominski; Magdalena Tkacz (2023). Overall scoring of methods for the Datasets (one plus equals one point; the more, the better). [Dataset]. http://doi.org/10.1371/journal.pone.0128845.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kornel Chrominski; Magdalena Tkacz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overall scoring of methods for the Datasets (one plus equals one point; the more, the better).

  18. m

    An experiment on the reliability analysis of megaproject sustainability

    • data.mendeley.com
    • narcis.nl
    Updated Jan 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhen Chen (2021). An experiment on the reliability analysis of megaproject sustainability [Dataset]. http://doi.org/10.17632/gy2h2ybtjg.1
    Explore at:
    Dataset updated
    Jan 5, 2021
    Authors
    Zhen Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hypothesis: The reliability can be adopted to quantitatively measure the sustainability of mega-projects.

    Presentation: This dataset shows two scenario based examples to establish an initial reliability assessment of megaproject sustainability. Data were gathered from the author’s assumption with regard to assumed differences between scenarios A and B. There are two sheets in this Microsoft Excel file, including a comparison between two scenarios by using a Fault Tree Analysis model, and a correlation analysis between reliability and unavailability.

    Notable findings: It has been found from this exploratory experiment that the reliability can be used to quantitatively measure megaproject sustainability, and there is a negative correlation between reliability and unavailability among 11 related events in association with sustainability goals in the life-cycle of megaproject.

    Interpretation: Results from data analysis by using the two sheets can be useful to inform decision making on megaproject sustainability. For example, the reliability to achieve sustainability goals can be enhanced by decrease the unavailability or the failure at individual work stages in megaproject delivery.

    Implication: This dataset file can be used to perform reliability analysis in other experiment to access megaproject sustainability.

  19. Data sets of the study.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shouxi Zhu; Hongbin Gu (2023). Data sets of the study. [Dataset]. http://doi.org/10.1371/journal.pone.0283577.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shouxi Zhu; Hongbin Gu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThis study aimed to explore the adverse influences of mobile phone usage on pilots’ status, so as to improve flight safety.MethodsA questionnaire was designed, and a cluster random sampling method was adopted. Pilots of Shandong Airlines were investigated on the use of mobile phones. The data was analyzed by frequency statistics, linear regression and other statistical methods.ResultsA total of 340 questionnaires were distributed and 317 were returned, 315 of which were valid. The results showed that 239 pilots (75.87%) used mobile phones as the main means of entertainment in their leisure time. There was a significant negative correlation between age of pilots and playing mobile games (p

  20. f

    Correlations between the N score time series from the genome dataset and the...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Zheng; Paul J. Wayper; Adrian J. Gibbs; Mathieu Fourment; Brendan C. Rodoni; Mark J. Gibbs (2023). Correlations between the N score time series from the genome dataset and the N score time series from other datasets for each site. [Dataset]. http://doi.org/10.1371/journal.pone.0001586.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Linda Zheng; Paul J. Wayper; Adrian J. Gibbs; Mathieu Fourment; Brendan C. Rodoni; Mark J. Gibbs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *N/A–no variants occurred at this site in the PPV genomes

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
srijanrawat86 (2023). MOVIE CORRELATION ANALYSIS-2ND PROJECT [Dataset]. https://www.kaggle.com/datasets/srijanrawat86/movie-correlation-analysis-2nd-project
Organization logo

MOVIE CORRELATION ANALYSIS-2ND PROJECT

Explore at:
zip(433664 bytes)Available download formats
Dataset updated
Oct 8, 2023
Authors
srijanrawat86
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset

This dataset was created by srijanrawat86

Released under CC0: Public Domain

Contents

Search
Clear search
Close search
Google apps
Main menu