100+ datasets found
  1. h

    deita-complexity-scorer-data

    • huggingface.co
    Updated Mar 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HKUST NLP Group (2024). deita-complexity-scorer-data [Dataset]. https://huggingface.co/datasets/hkust-nlp/deita-complexity-scorer-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2024
    Dataset authored and provided by
    HKUST NLP Group
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Deita Complexity Scorer Training Data

    GitHub | Paper Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs). This dataset includes data for training Deita Complexity Scorer. Model Family: Other models and the dataset are found in the Deita Collection

      Performance
    

    Model Align Data Size MT-Bench AlpacaEval(%) OpenLLM (Avg.)

    Proprietary Models

    GPT-4-Turbo… See the full description on the dataset page: https://huggingface.co/datasets/hkust-nlp/deita-complexity-scorer-data.

  2. Banking data - complexity

    • figshare.com
    xlsx
    Updated Sep 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Theis (2023). Banking data - complexity [Dataset]. http://doi.org/10.6084/m9.figshare.22864019.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Sebastian Theis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Banking data (RIBITS) - freshwater targets and bank specs.

  3. i

    Supplemental Material for the paper entitled “Handling Data Complexity and...

    • ieee-dataport.org
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanting Yan (2025). Supplemental Material for the paper entitled “Handling Data Complexity and Class-imbalance for Software Defect Prediction”. [Dataset]. https://ieee-dataport.org/documents/supplemental-material-paper-entitled-handling-data-complexity-and-class-imbalance
    Explore at:
    Dataset updated
    Oct 7, 2025
    Authors
    Yuanting Yan
    Description

    This is the supplemental material for the paper entitled “Handling Data Complexity and Class-imbalance for Software Defect Prediction”.

  4. Network complexity data

    • figshare.com
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Rostami; Hernan Mondani (2016). Network complexity data [Dataset]. http://doi.org/10.6084/m9.figshare.1297161.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Amir Rostami; Hernan Mondani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data files complexity of network data.

  5. H

    Replication Data for: "Complexity and Sophistication"

    • dataverse.harvard.edu
    Updated Jul 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leandro Carvalho; Dan Silverman (2023). Replication Data for: "Complexity and Sophistication" [Dataset]. http://doi.org/10.7910/DVN/JVFTEG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 17, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Leandro Carvalho; Dan Silverman
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is the replication package for "Complexity and Sophistication," accepted in 2023 by the Journal of Political Economy Microeconomics.

  6. Features arising following single feature complexity analysis.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neila Mezghani; Imene Mechmeche; Amar Mitiche; Youssef Ouakrim; Jacques A. de Guise (2023). Features arising following single feature complexity analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0202348.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Neila Mezghani; Imene Mechmeche; Amar Mitiche; Youssef Ouakrim; Jacques A. de Guise
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Features arising following single feature complexity analysis.

  7. f

    Data from: Complexity of models.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Mar 14, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heide-Jørgensen, Mads Peter; Ngô, Manh Cuong; Ditlevsen, Susanne (2019). Complexity of models. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000097772
    Explore at:
    Dataset updated
    Mar 14, 2019
    Authors
    Heide-Jørgensen, Mads Peter; Ngô, Manh Cuong; Ditlevsen, Susanne
    Description

    Runtimes and number of variables for different state distributions and for 2, 3 and 4 states for covariate model 1. Runtimes are on Intel Xeon E5-2697v2 @ 2.7 GHz.

  8. d

    Data from: Sentence Complexity Dataset

    • da-ra.de
    • search.gesis.org
    Updated Aug 19, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanja Stajner (2017). Sentence Complexity Dataset [Dataset]. http://doi.org/10.7801/243
    Explore at:
    Dataset updated
    Aug 19, 2017
    Dataset provided by
    da|ra
    Mannheim University Library
    Authors
    Sanja Stajner
    Description

    The metadata set does not comprise any description or summary. The information has not been provided.

  9. H

    Growth Projections and Complexity Rankings

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Growth Lab at Harvard University (2025). Growth Projections and Complexity Rankings [Dataset]. http://doi.org/10.7910/DVN/XTAQMC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    The Growth Lab at Harvard University
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1995 - Dec 31, 2023
    Description

    Each year, researchers at Harvard's Growth Lab release growth forecasts for the upcoming decade as well as annual rankings of countries by economic complexity. The Economic Complexity Index (ECI) ranking is a measure of the amount of capabilities and knowhow of a given country determined by the diversity, ubiquity, and complexity of the products it exports. Growth projections are calculated through a process largely based on determining whether a country's economic complexity is higher or lower than expected given its level of income. We expect countries whose economic complexity is greater than we would expect for its level of income to grow faster than those that are "too rich" for their current level of complexity. In this data, a country's growth projection value for a given year is for the decade beginning with that year. For example, a value in a 2017 row is the projection of annualized growth for 2017–2027.

  10. b

    Background complexity code and data - Datasets - data.bris

    • data.bris.ac.uk
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Background complexity code and data - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/1bxjeumjxu6qi2x2eu8otv7wt4
    Explore at:
    Dataset updated
    Feb 7, 2025
    Description

    Data and code to accompany Lunt et al. "Background choice is mediated by complexity in cephalopods" Complete download (zip, 2.4 MiB)

  11. n

    Data from: Predicting human complexity perception of real-world scenes

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Dec 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fintan Nagle; Nilli Lavie (2020). Predicting human complexity perception of real-world scenes [Dataset]. http://doi.org/10.5061/dryad.3fs556j
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 29, 2020
    Dataset provided by
    ,
    Authors
    Fintan Nagle; Nilli Lavie
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Perceptual load is a well-established determinant of attentional engagement in a task. So far, perceptual load has typically been manipulated by increasing either the number of task-relevant items or the perceptual processing demand (e.g. conjunction vs. feature tasks). The tasks used often involved rather simple visual displays (e.g. letters or single objects). How can perceptual load be operationalised for richer, real-world images? A promising proxy is the visual complexity of an image. However, current predictive models for visual complexity have limited applicability to diverse real-world images. Here we modelled visual complexity using a deep convolutional neural network trained to learn perceived ratings of visual complexity. We presented 53 observers with 4000 images from the PASCAL VOC dataset, obtaining 75,020 2AFC paired comparisons across observers. Image visual complexity scores were obtained using the TrueSkill algorithm. A CNN with weights pre-trained on an object recognition task predicted complexity ratings with r=0.83. In contrast, feature-based models as used in the literature, working on image statistics such as entropy, edge density and JPEG compression ratio, only achieved r = 0.70. Thus, our model offers a promising method to quantify the perceptual load of real-world scenes through visual complexity.

  12. f

    Thresholds for single feature complexity.

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neila Mezghani; Imene Mechmeche; Amar Mitiche; Youssef Ouakrim; Jacques A. de Guise (2023). Thresholds for single feature complexity. [Dataset]. http://doi.org/10.1371/journal.pone.0202348.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Neila Mezghani; Imene Mechmeche; Amar Mitiche; Youssef Ouakrim; Jacques A. de Guise
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thresholds for single feature complexity.

  13. D

    Data for ‘Computational Complexity Explains Neural Differences in Quantifier...

    • dataverse.no
    • dataverse.azure.uit.no
    • +1more
    Updated Sep 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heming Strømholt Bremnes; Heming Strømholt Bremnes (2023). Data for ‘Computational Complexity Explains Neural Differences in Quantifier Verification’ [Dataset]. http://doi.org/10.18710/M6VT6Z
    Explore at:
    txt(589), type/x-r-syntax(790), txt(522), txt(3138), xlsx(231785), zip(6872383938), zip(8675467381), application/matlab-mat(173222611), xlsx(309887), txt(5426), application/matlab-mat(173366275), type/x-r-syntax(1187), txt(12050), txt(513), txt(4419), txt(3826)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Heming Strømholt Bremnes; Heming Strømholt Bremnes
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Different classes of quantifiers provably require different verification algorithms with different complexity profiles. The algorithm for proportional quantifiers, like most', is more complex than that for nonproportional quantifiers, likeall' and `three'. We tested the hypothesis that different complexity profiles affect ERP responses during sentence verification, but not during sentence comprehension. In experiment 1, participants had to determine the truth value of a sentence relative to a previously presented array of geometric objects. We observed a sentence-final negative effect of truth value, modulated by quantifier class. Proportional quantifiers elicited a sentence-internal positivity compared to nonproportional quantifiers, in line with their different verification profiles. In experiment 2, the same stimuli were shown, followed by comprehension questions instead of verification. ERP responses specific to proportional quantifiers disappeared in experiment 2, suggesting that they are only evoked in a verification task and thus reflect the verification procedure itself. The present dataset contains behavioural and EEG data from both experiments, as well as analysis scripts for both data types in R and Matlab/FieldTrip.

  14. b

    Background complexity code and data v2 - Datasets - data.bris

    • data.bris.ac.uk
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Background complexity code and data v2 - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/1fs7hgxfhdypg2nal3fuppxjvw
    Explore at:
    Dataset updated
    Aug 15, 2025
    Description

    Revised data and code to accompany Lunt et al. "Intensity contrast drives background choice in cephalopods"

  15. Data from: the-complexity-trap

    • huggingface.co
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JetBrains Research (2025). the-complexity-trap [Dataset]. https://huggingface.co/datasets/JetBrains-Research/the-complexity-trap
    Explore at:
    Dataset updated
    Oct 14, 2025
    Dataset provided by
    JetBrainshttp://jetbrains.com/
    Authors
    JetBrains Research
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

    This dataset contains our raw experimental data (ie. agent trajectories) accompanying the paper "The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management" and Tobias Lindenbauer's Master's thesis. The data in this repository are compressed to .tar.gz archives. For detailed instructions on how to use these data… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/the-complexity-trap.

  16. Z

    Research data, sources and documents for thesis on Exploring Complexity...

    • data.niaid.nih.gov
    • nde-dev.biothings.io
    • +2more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike A. Marin (2020). Research data, sources and documents for thesis on Exploring Complexity Metrics for Artifact-Centric Business Process Models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1241594
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    University of South Africa (UNISA)
    Authors
    Mike A. Marin
    Description

    Research data, sources and documents for thesis on Exploring Complexity Metrics for Artifact-Centric Business Process Models This repository contains the supplemental material for the thesis "Exploring Complexity Metrics for Artifact-Centric Business Process Models" by Marin, Mike A., Ph.D., University of South Africa (South Africa), 2017.

  17. Economic Complexity and International Trade

    • kaggle.com
    zip
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Jalaali (2025). Economic Complexity and International Trade [Dataset]. https://www.kaggle.com/datasets/alijalali4ai/economic-complexity-and-international-trade
    Explore at:
    zip(39653519 bytes)Available download formats
    Dataset updated
    Jun 1, 2025
    Authors
    Ali Jalaali
    Description

    This dataset compiles key indicators of global trade and economic complexity, curated from the Harvard Growth Lab's Atlas of Economic Complexity.

    Data spans multiple classification systems (HS12, HS92, SITC, and Services), enabling a wide range of cross-national and historical trade analyses.

    Data Sources

    All data is directly downloaded from: The Atlas of Economic Complexity: Downloads - Rankings This project repackages publicly available data into a Kaggle-friendly format for exploration and analysis.

  18. u

    Data from: iRead4Skills Dataset 2: annotated corpora by level of complexity...

    • investigacion.usc.gal
    • chef.afue.org
    • +3more
    Updated 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pintard, Alice; François, Thomas; Justine, Nagant de Deuxchaisnes; Barbosa, Sílvia; Reis, Maria Leonor; Moutinho, Michell; Monteiro, Ricardo; Amaro, Raquel; Correia, Susana; Rodríguez Rey, Sandra; Mu, Keran; Garcia Gonzålez, Marcos; Bernårdez Braùa, AndrÊ; Blanco Escoda, Xavier; Pintard, Alice; François, Thomas; Justine, Nagant de Deuxchaisnes; Barbosa, Sílvia; Reis, Maria Leonor; Moutinho, Michell; Monteiro, Ricardo; Amaro, Raquel; Correia, Susana; Rodríguez Rey, Sandra; Mu, Keran; Garcia Gonzålez, Marcos; Bernårdez Braùa, AndrÊ; Blanco Escoda, Xavier (2024). iRead4Skills Dataset 2: annotated corpora by level of complexity for FR, PT and SP [Dataset]. https://investigacion.usc.gal/documentos/67321bc6aea56d4af04826f9
    Explore at:
    Dataset updated
    2024
    Authors
    Pintard, Alice; François, Thomas; Justine, Nagant de Deuxchaisnes; Barbosa, Sílvia; Reis, Maria Leonor; Moutinho, Michell; Monteiro, Ricardo; Amaro, Raquel; Correia, Susana; Rodríguez Rey, Sandra; Mu, Keran; Garcia Gonzålez, Marcos; Bernårdez Braùa, AndrÊ; Blanco Escoda, Xavier; Pintard, Alice; François, Thomas; Justine, Nagant de Deuxchaisnes; Barbosa, Sílvia; Reis, Maria Leonor; Moutinho, Michell; Monteiro, Ricardo; Amaro, Raquel; Correia, Susana; Rodríguez Rey, Sandra; Mu, Keran; Garcia Gonzålez, Marcos; Bernårdez Braùa, AndrÊ; Blanco Escoda, Xavier
    Description

    The Dataset 2: annotated corpora by level of complexity for FR, PT and SP is a collection of texts categorized by complexity level and annotated for complexity features, presented in Excel format (.xlsx). These corpora were compiled and annotated under the scope of the project iRead4Skills – Intelligent Reading Improvement System for Fundamental and Transversal Skills Development, funded by the European Commission (grant number: 1010094837). The project aims to enhance reading skills within the adult population by creating an intelligent system that assesses text complexity and recommends suitable reading materials to adults with low literacy skills, contributing to reducing skills gaps and facilitating access to information and culture (https://iread4skills.com).

    This dataset is the result of specifically devised classification and annotation tasks, in which selected texts were organized and distributed to trainers in Adult Learning (AL) and Vocational Education Training (VET) Centres, as well as to adult students in AL and VET centres. This task was conducted via the Qualtrics platform.

    The Dataset 2: annotated corpora by level of complexity for FR, PT and SP is derived from the iRead4Skills Dataset 1: corpora by level of complexity for FR, PT and SP ( https://doi.org/10.5281/zenodo.10055909), which comprises written texts of various genres and complexity levels. From this collection, a sample of texts was selected for classification and annotation. This classification and annotation task aimed to provide additional data and test sets for the complexity analysis systems for the three languages of the project: French, Portuguese, and Spanish. The sample texts in each of the language corpora were selected taking into account the diversity of topics/domains, genres, and the reading preferences of the target audience of the iRead4Skills project. This percentage amounted to the total of 462 texts per language, which were divided by level of complexity, resulting in the following distribution:

    ¡ 140 Very Easy texts

    ¡ 140 Easy texts

    ¡ 140 Plain texts

    ¡ 42 More Complex texts.

    Trainers and students were asked to classify the texts according to the complexity levels of the project, here informally defined as:

    ¡ Very Easy (everyone can understand the text or most of the text).

    ¡ Easy (a person with less than the 9th year of schooling can understand the text or most of the text)

    ¡ Plain (a person with the 9th year of schooling can understand the text the first time he/she reads it)

    ¡ More complex (a person with the 9th year of schooling cannot understand the text the first time he/she reads it).

    Annotators were also asked to mark the parts of the texts considered complex according to various type of features, at word-level and at sentence-level (e.g., word order, sentence composition, etc.), The full details regarding the students and the trainers’ tasks, data qualitative and quantitative description and inter-annotator agreement are described here: https://zenodo.org/records/14653180

    The results are here presented in Excel format. For each language, and for each group (trainers and students), two pairs of files exist – the annotation and the classification files – resulting in four files per language and twelve files, in total.

    In all files, the data is organized as a matrix, with each row representing an ‘answer’ from a particular participant, and the columns containing various details about that specific input, as shown below:

    Column name

    Data

    Annotator's ID

    The randomly generated ID code for each annotator, together with information on the dataset assigned to them.

    Progress

    Information on the completion of the task (for each text).

    Duration (seconds)

    Time used in the completion of the task (for each text).

    File Name

    N1 = Very Easy

    N2 = Easy

    N3 = Plain

    N4=More Complex

    File internal identification, providing its iRead4Skills classification.

    Text

    The content of the file, i.e. the text itself.

    Annotated Level

    Level assigned by the annotator (trainer).

    Proficiency SubLevel

    (Likert Scale - 1 to 5)

    SubLevel assigned by the annotator (trainer) for FR data.

    Corresponding CEFR Level

    CEFR level closest to the iRead4Skills

    Additional Info

    Observations made by the trainers/students

    Annotated Term

    Word or set of words selected for annotation

    Term Label

    Annotation assigned to the Annotated Term (difficult word, word order, etc.)

    Term Index

    Position of the annotated term in the text

    Annotator's Proficiency Level

    Level of AL/VET of the student

    Text adequate for user

    Validation of the text by the students

    The content of the column “File Name” is color-coded, where a green shade alludes to a text with a lower level of complexity and a red one alludes to one with a higher level of complexity.

    The complete datasets are available under creative CC BY-NC-ND 4.0.

  19. n

    Data from: From Chaos to Harmony: Addressing Data De-Noising, Complexity and...

    • curate.nd.edu
    pdf
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qianlong Wen (2025). From Chaos to Harmony: Addressing Data De-Noising, Complexity and Adaptability in Graph Machine Learning [Dataset]. http://doi.org/10.7274/28786127.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Qianlong Wen
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Graph representation learning—especially via graph neural networks (GNNs)—has demonstrated considerable promise in modeling intricate interaction systems, such as social networks and molecular structures. However, the deployment of GNN-based frameworks in industrial settings remains challenging due to the inherent complexity and noise in real-world graph data. This dissertation systematically addresses these challenges by advancing novel methodologies to improve the comprehensiveness and robustness of graph representation learning, with a dual focus on resolving data complexity and denoising across diverse graph-learning scenarios. In addressing graph data denoising, we design auxiliary self-supervised optimization objectives that disentangle noisy topological structures and misinformation while preserving the representational sufficiency of critical graph features. These tasks operate synergistically with primary learning objectives to enhance robustness against data corruption. The efficacy of these techniques is demonstrated through their application to real-world opioid prescription time series data for predicting potential opioid over-prescription. To mitigate data complexity, the study investigates two complementary approaches: (1) multimodal fusion, which employs attentive integration of graph data with features from other modalities, and (2) hierarchical substructure mining, which extracts semantic patterns at multiple granularities to enhance model generalization in demanding contexts. Finally, the dissertation explores the adaptability of graph data in a range of practical applications, including E-commerce demand forecasting and recommendations, to further enhance prediction and reasoning capabilities.

  20. Global Statistical Analysis Software Market Size By Deployment Model, By...

    • verifiedmarketresearch.com
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Statistical Analysis Software Market Size By Deployment Model, By Application, By Component, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/statistical-analysis-software-market/
    Explore at:
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Statistical Analysis Software Market size was valued at USD 7,963.44 Million in 2023 and is projected to reach USD 13,023.63 Million by 2030, growing at a CAGR of 7.28% during the forecast period 2024-2030.

    Global Statistical Analysis Software Market Drivers

    The market drivers for the Statistical Analysis Software Market can be influenced by various factors. These may include:

    Growing Data Complexity and Volume: The demand for sophisticated statistical analysis tools has been fueled by the exponential rise in data volume and complexity across a range of industries. Robust software solutions are necessary for organizations to evaluate and extract significant insights from huge datasets. Growing Adoption of Data-Driven Decision-Making: Businesses are adopting a data-driven approach to decision-making at a faster rate. Utilizing statistical analysis tools, companies can extract meaningful insights from data to improve operational effectiveness and strategic planning. Developments in Analytics and Machine Learning: As these fields continue to progress, statistical analysis software is now capable of more. These tools' increasing popularity can be attributed to features like sophisticated modeling and predictive analytics. A greater emphasis is being placed on business intelligence: Analytics and business intelligence are now essential components of corporate strategy. In order to provide business intelligence tools for studying trends, patterns, and performance measures, statistical analysis software is essential. Increasing Need in Life Sciences and Healthcare: Large volumes of data are produced by the life sciences and healthcare sectors, necessitating complex statistical analysis. The need for data-driven insights in clinical trials, medical research, and healthcare administration is driving the market for statistical analysis software. Growth of Retail and E-Commerce: The retail and e-commerce industries use statistical analytic tools for inventory optimization, demand forecasting, and customer behavior analysis. The need for analytics tools is fueled in part by the expansion of online retail and data-driven marketing techniques. Government Regulations and Initiatives: Statistical analysis is frequently required for regulatory reporting and compliance with government initiatives, particularly in the healthcare and finance sectors. In these regulated industries, statistical analysis software uptake is driven by this. Big Data Analytics's Emergence: As big data analytics has grown in popularity, there has been a demand for advanced tools that can handle and analyze enormous datasets effectively. Software for statistical analysis is essential for deriving valuable conclusions from large amounts of data. Demand for Real-Time Analytics: In order to make deft judgments fast, there is a growing need for real-time analytics. Many different businesses have a significant demand for statistical analysis software that provides real-time data processing and analysis capabilities. Growing Awareness and Education: As more people become aware of the advantages of using statistical analysis in decision-making, its use has expanded across a range of academic and research institutions. The market for statistical analysis software is influenced by the academic sector. Trends in Remote Work: As more people around the world work from home, they are depending more on digital tools and analytics to collaborate and make decisions. Software for statistical analysis makes it possible for distant teams to efficiently examine data and exchange findings.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
HKUST NLP Group (2024). deita-complexity-scorer-data [Dataset]. https://huggingface.co/datasets/hkust-nlp/deita-complexity-scorer-data

deita-complexity-scorer-data

hkust-nlp/deita-complexity-scorer-data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2024
Dataset authored and provided by
HKUST NLP Group
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for Deita Complexity Scorer Training Data

GitHub | Paper Deita is an open-sourced project designed to facilitate Automatic Data Selection for instruction tuning in Large Language Models (LLMs). This dataset includes data for training Deita Complexity Scorer. Model Family: Other models and the dataset are found in the Deita Collection

  Performance

Model Align Data Size MT-Bench AlpacaEval(%) OpenLLM (Avg.)

Proprietary Models

GPT-4-Turbo… See the full description on the dataset page: https://huggingface.co/datasets/hkust-nlp/deita-complexity-scorer-data.

Search
Clear search
Close search
Google apps
Main menu