100+ datasets found
  1. f

    Data from: Two Dimensional Mass Mapping as a General Method of Data...

    • acs.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev (2023). Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures [Dataset]. http://doi.org/10.1021/ac802532j.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A recent proteomics-grade (95%+ sequence reliability) high-throughput de novo sequencing method utilizes the benefits of high resolution, high mass accuracy, and the use of two complementary fragmentation techniques collision-activated dissociation (CAD) and electron capture dissociation (ECD). With this high-fidelity sequencing approach, hundreds of peptides can be sequenced de novo in a single LC−MS/MS experiment. The high productivity of the new analysis technique has revealed a new bottleneck which occurs in data representation. Here we suggest a new method of data analysis and visualization that presents a comprehensive picture of the peptide content including relative abundances and grouping into families. The 2D mass mapping consists of putting the molecular masses onto a two-dimensional bubble plot, with the relative monoisotopic mass defect and isotopic shift being the axes and with the bubble area proportional to the peptide abundance. Peptides belonging to the same family form a compact group on such a plot, so that the family identity can in many cases be determined from the molecular mass alone. The performance of the method is demonstrated on the high-throughput analysis of skin secretion from three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. Two dimensional mass maps simplify the task of global comparison between the species and make obvious the similarities and differences in the peptide contents that are obscure in traditional data presentation methods. Even biological activity of the peptide can sometimes be inferred from its position on the plot. Two dimensional mass mapping is a general method applicable to any complex mixture, peptide and nonpeptide alike.

  2. f

    Data from: Data_Sheet_1_An Active Data Representation of Videos for...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz (2023). Data_Sheet_1_An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation.PDF [Dataset]. http://doi.org/10.3389/fcomp.2020.00001.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioral cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience, and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.

  3. i

    Bioinformatics Applications using a Multi-layered Data Representation

    • ieee-dataport.org
    Updated May 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diogo Vieira (2025). Bioinformatics Applications using a Multi-layered Data Representation [Dataset]. https://ieee-dataport.org/documents/bioinformatics-applications-using-multi-layered-data-representation
    Explore at:
    Dataset updated
    May 25, 2025
    Authors
    Diogo Vieira
    Description

    Data for all phases of new method and results. This is the last and published version of results.

  4. H

    Supplementary Materials for A Linked Data Representation for Summary...

    • dataverse.harvard.edu
    • dataone.org
    Updated Aug 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James McCusker (2019). Supplementary Materials for A Linked Data Representation for Summary Statistics and Grouping Criteria [Dataset]. http://doi.org/10.7910/DVN/OK0BUG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    James McCusker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium's provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute.

  5. e

    Classification & presentation of Data

    • paper.erudition.co.in
    html
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2025). Classification & presentation of Data [Dataset]. https://paper.erudition.co.in/makaut/bachelor-in-business-administration-2020-2021/3/business-research-methods
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Nov 23, 2025
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Classification & presentation of Data of Business Research Methods, 3rd Semester , Bachelor in Business Administration 2020 - 2021

  6. d

    Data from: Construction of symmetric group representation matrices and...

    • elsevier.digitalcommonsdata.com
    Updated Jan 1, 1981
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M.F. Soto (1981). Construction of symmetric group representation matrices and states [Dataset]. http://doi.org/10.17632/v4dcf4sf8z.1
    Explore at:
    Dataset updated
    Jan 1, 1981
    Authors
    M.F. Soto
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Title of program: SYMSTATS Catalogue Id: AAME_v1_0

    Nature of problem To find explicitly basis states and basis state operators for the symmetric groups, in a form applicable to any symmetric group and most useful for applications.

    Versions of this program held in the CPC repository in Mendeley Data aame_v1_0; SYMSTATS; 10.1016/0010-4655(81)90132-6

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  7. e

    Diagrammatic and Graphical representation of Numerical Data

    • paper.erudition.co.in
    html
    Updated Jun 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2021). Diagrammatic and Graphical representation of Numerical Data [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2020-2021/5/numerical-and-statistical-methods
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 1, 2021
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Diagrammatic and Graphical representation of Numerical Data of Numerical and statistical Methods, 5th Semester , Bachelor of Computer Application 2020-2021

  8. f

    A number of possible methods that can be used given the specific...

    • datasetcatalog.nlm.nih.gov
    Updated Nov 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smilde, Age K.; de Rooi, Johan; Bønnelykke, Klaus; Bisgaard, Hans; Nørgaard, Sarah K.; Rasmussen, Morten A. (2018). A number of possible methods that can be used given the specific representation of the raw data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000636829
    Explore at:
    Dataset updated
    Nov 29, 2018
    Authors
    Smilde, Age K.; de Rooi, Johan; Bønnelykke, Klaus; Bisgaard, Hans; Nørgaard, Sarah K.; Rasmussen, Morten A.
    Description

    Methods discussed in the text are printed in bold type.

  9. n

    Data from: Multimodal Learning on Graphs: Methods and Applications

    • curate.nd.edu
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yihong Ma (2025). Multimodal Learning on Graphs: Methods and Applications [Dataset]. http://doi.org/10.7274/28792454.v1
    Explore at:
    Dataset updated
    May 14, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Yihong Ma
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Graph data represents complex relationships across diverse domains, from social networks to healthcare and chemical sciences. However, real-world graph data often spans multiple modalities, including time-varying signals from sensors, semantic information from textual representations, and domain-specific encodings. This dissertation introduces innovative multimodal learning techniques for graph-based predictive modeling, addressing the intricate nature of these multidimensional data representations. The research systematically advances graph learning through innovative methodological approaches across three critical modalities. Initially, we establish robust graph-based methodological foundations through advanced techniques including prompt tuning for heterogeneous graphs and a comprehensive framework for imbalanced learning on graph data. we then extend these methods to time series analysis, demonstrating their practical utility through applications such as hierarchical spatio-temporal modeling for COVID-19 forecasting and graph-based density estimation for anomaly detection in unmanned aerial systems. Finally, we explore textual representations of graphs in the chemical domain, reformulating reaction yield prediction as an imbalanced regression problem to enhance performance in underrepresented high-yield regions critical to chemists.

  10. n

    Data from: New Deep Learning Methods for Medical Image Analysis and...

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pengfei Gu (2024). New Deep Learning Methods for Medical Image Analysis and Scientific Data Generation and Compression [Dataset]. http://doi.org/10.7274/26156719.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Pengfei Gu
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Medical image analysis is critical to biological studies, health research, computer- aided diagnoses, and clinical applications. Recently, deep learning (DL) techniques have achieved remarkable successes in medical image analysis applications. However, these techniques typically require large amounts of annotations to achieve satisfactory performance. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for medical image analysis while reducing annotation efforts? To address this problem, we have outlined two specific aims: (A1) Utilize existing annotations effectively from advanced models; (A2) extract generic knowledge directly from unannotated images.

    To achieve the aim (A1): First, we introduce a new data representation called TopoImages, which encodes the local topology of all the image pixels. TopoImages can be complemented with the original images to improve medical image analysis tasks. Second, we propose a new augmentation method, SAMAug-C, that lever- ages the Segment Anything Model (SAM) to augment raw image input and enhance medical image classification. Third, we propose two advanced DL architectures, kCBAC-Net and ConvFormer, to enhance the performance of 2D and 3D medical image segmentation. We also present a gate-regularized network training (GrNT) approach to improve multi-scale fusion in medical image segmentation. To achieve the aim (A2), we propose a novel extension of known Masked Autoencoders (MAEs) for self pre-training, i.e., models pre-trained on the same target dataset, specifically for 3D medical image segmentation.

    Scientific visualization is a powerful approach for understanding and analyzing various physical or natural phenomena, such as climate change or chemical reactions. However, the cost of scientific simulations is high when factors like time, ensemble, and multivariate analyses are involved. Additionally, scientists can only afford to sparsely store the simulation outputs (e.g., scalar field data) or visual representations (e.g., streamlines) or visualization images due to limited I/O bandwidths and storage space. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for scientific data generation and compression while reducing simulation and storage costs?

    To tackle this problem: First, we propose a DL framework that generates un- steady vector fields data from a set of streamlines. Based on this method, domain scientists only need to store representative streamlines at simulation time and recon- struct vector fields during post-processing. Second, we design a novel DL method that translates scalar fields to vector fields. Using this approach, domain scientists only need to store scalar field data at simulation time and generate vector fields from their scalar field counterparts afterward. Third, we present a new DL approach that compresses a large collection of visualization images generated from time-varying data for communicating volume visualization results.

  11. H

    Replication Data for: Reform and Representation: A New Method Applied to...

    • dataverse.harvard.edu
    Updated Sep 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris Shor; Thad Kousser; Justin Phillips (2016). Replication Data for: Reform and Representation: A New Method Applied to Recent Electoral Changes [Dataset]. http://doi.org/10.7910/DVN/5AYBI9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Boris Shor; Thad Kousser; Justin Phillips
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Can electoral reforms such as an independent redistricting commission and the top-two primary create conditions that lead to better legislative representation? We explore this question by presenting a new method for measuring a key indicator of representation – the congruence between a legislator's ideological position and the average position of her district's voters. Our novel approach combines two methods: the joint classification of voters and political candidates on the same ideological scale, along with multilevel regression and post-stratification to estimate the position of the average voter across many districts in multiple elections. After validating our approach, we use it to study the recent impact of reforms in California, showing that they did not bring their hoped-for effects.

  12. z

    Classification of web-based Digital Humanities projects leveraging...

    • zenodo.org
    • data-staging.niaid.nih.gov
    csv, tsv
    Updated Nov 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommaso Battisti; Tommaso Battisti (2025). Classification of web-based Digital Humanities projects leveraging information visualisation techniques [Dataset]. http://doi.org/10.5281/zenodo.14192758
    Explore at:
    tsv, csvAvailable download formats
    Dataset updated
    Nov 10, 2025
    Dataset provided by
    Zenodo
    Authors
    Tommaso Battisti; Tommaso Battisti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This dataset contains a list of 186 Digital Humanities projects leveraging information visualisation techniques. Each project has been classified according to visualisation and interaction methods, narrativity and narrative solutions, domain, methods for the representation of uncertainty and interpretation, and the employment of critical and custom approaches to visually represent humanities data.

    Classification schema: categories and columns

    The project_id column contains unique internal identifiers assigned to each project. Meanwhile, the last_access column records the most recent date (in DD/MM/YYYY format) on which each project was reviewed based on the web address specified in the url column.
    The remaining columns can be grouped into descriptive categories aimed at characterising projects according to different aspects:

    Narrativity. It reports the presence of information visualisation techniques employed within narrative structures. Here, the term narrative encompasses both author-driven linear data stories and more user-directed experiences where the narrative sequence is determined by user exploration [1]. We define 2 columns to identify projects using visualisation techniques in narrative, or non-narrative sections. Both conditions can be true for projects employing visualisations in both contexts. Columns:

    • non_narrative (boolean)

    • narrative (boolean)

    Domain. The humanities domain to which the project is related. We rely on [2] and the chapters of the first part of [3] to abstract a set of general domains. Column:

    • domain (categorical):

      • History and archaeology

      • Art and art history

      • Language and literature

      • Music and musicology

      • Multimedia and performing arts

      • Philosophy and religion

      • Other: both extra-list domains and cases of collections without a unique or specific thematic focus.

    Visualisation of uncertainty and interpretation. Buiding upon the frameworks proposed by [4] and [5], a set of categories was identified, highlighting a distinction between precise and impressional communication of uncertainty. Precise methods explicitly represent quantifiable uncertainty such as missing, unknown, or uncertain data, precisely locating and categorising it using visual variables and positioning. Two sub-categories are interactive distinction, when uncertain data is not visually distinguishable from the rest of the data but can be dynamically isolated or included/excluded categorically through interaction techniques (usually filters); and visual distinction, when uncertainty visually “emerges” from the representation by means of dedicated glyphs and spatial or visual cues and variables. On the other hand, impressional methods communicate the constructed and situated nature of data [6], exposing the interpretative layer of the visualisation and indicating more abstract and unquantifiable uncertainty using graphical aids or interpretative metrics. Two sub-categories are: ambiguation, when the use of graphical expedients—like permeable glyph boundaries or broken lines—visually convey the ambiguity of a phenomenon; and interpretative metrics, when expressive, non-scientific, or non-punctual metrics are used to build a visualisation. Column:

    • uncertainty_interpretation (categorical):

      • Interactive distinction

      • Visual distinction

      • Ambiguation

      • Interpretative metrics

    Critical adaptation. We identify projects in which, with regards to at least a visualisation, the following criteria are fulfilled: 1) avoid repurposing of prepackaged, generic-use, or ready-made solutions; 2) being tailored and unique to reflect the peculiarities of the phenomena at hand; 3) avoid simplifications to embrace and depict complexity, promoting time-consuming visualisation-based inquiry. Column:

    • critical_adaptation (boolean)

    Non-temporal visualisation techniques. We adopt and partially adapt the terminology and definitions from [7]. A column is defined for each type of visualisation and accounts for its presence within a project, also including stacked layouts and more complex variations. Columns and inclusion criteria:

    • plot (boolean): visual representations that map data points onto a two-dimensional coordinate system.

    • cluster_or_set (boolean): sets or cluster-based visualisations used to unveil possible inter-object similarities.

    • map (boolean): geographical maps used to show spatial insights. While we do not specify the variants of maps (e.g., pin maps, dot density maps, flow maps, etc.), we make an exception for maps where each data point is represented by another visualisation (e.g., a map where each data point is a pie chart) by accounting for the presence of both in their respective columns.

    • network (boolean): visual representations highlighting relational aspects through nodes connected by links or edges.

    • hierarchical_diagram (boolean): tree-like structures such as tree diagrams, radial trees, but also dendrograms. They differ from networks for their strictly hierarchical structure and absence of closed connection loops.

    • treemap (boolean): still hierarchical, but highlighting quantities expressed by means of area size. It also includes circle packing variants.

    • word_cloud (boolean): clouds of words, where each instance’s size is proportional to its frequency in a related context

    • bars (boolean): includes bar charts, histograms, and variants. It coincides with “bar charts” in [7] but with a more generic term to refer to all bar-based visualisations.

    • line_chart (boolean): the display of information as sequential data points connected by straight-line segments.

    • area_chart (boolean): similar to a line chart but with a filled area below the segments. It also includes density plots.

    • pie_chart (boolean): circular graphs divided into slices which can also use multi-level solutions.

    • plot_3d (boolean): plots that use a third dimension to encode an additional variable.

    • proportional_area (boolean): representations used to compare values through area size. Typically, using circle- or square-like shapes.

    • other (boolean): it includes all other types of non-temporal visualisations that do not fall into the aforementioned categories.

    Temporal visualisations and encodings. In addition to non-temporal visualisations, a group of techniques to encode temporality is considered in order to enable comparisons with [7]. Columns:

    • timeline (boolean): the display of a list of data points or spans in chronological order. They include timelines working either with a scale or simply displaying events in sequence. As in [7], we also include structured solutions resembling Gantt chart layouts.

    • temporal_dimension (boolean): to report when time is mapped to any dimension of a visualisation, with the exclusion of timelines. We use the term “dimension” and not “axis” as in [7] as more appropriate for radial layouts or more complex representational choices.

    • animation (boolean): temporality is perceived through an animation changing the visualisation according to time flow.

    • visual_variable (boolean): another visual encoding strategy is used to represent any temporality-related variable (e.g., colour).

    Interactions. A set of categories to assess affordable interactions based on the concept of user intent [8] and user-allowed perceptualisation data actions [9]. The following categories roughly match the manipulative subset of methods of the “how” an interaction is performed in the conception of [10]. Only interactions that affect the aspect of the visualisation or the visual representation of its data points, symbols, and glyphs are taken into consideration. Columns:

    • basic_selection (boolean): the demarcation of an element either for the duration of the interaction or more permanently until the occurrence of another selection.

    • advanced_selection (boolean): the demarcation involves both the selected element and connected elements within the visualisation or leads to brush and link effects across views. Basic selection is tacitly implied.

    • navigation (boolean): interactions that allow moving, zooming, panning, rotating, and scrolling the view but only when applied to the visualisation and not to the web page. It also includes “drill” interactions (to navigate through different levels or portions of data detail, often generating a new view that replaces or accompanies the original) and “expand” interactions generating new perspectives on data by expanding and collapsing nodes.

    • arrangement (boolean): the organisation of visualisation elements (symbols, glyphs, etc.) or multi-visualisation layouts spatially through drag and drop or

  13. E

    Data from: Vector representations of polish words (Word2Vec method)

    • live.european-language-grid.eu
    binary format
    Updated Nov 6, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Vector representations of polish words (Word2Vec method) [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20234
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Nov 6, 2016
    License

    https://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html

    Description

    Model skip gram with vectors of length 100. Trained on kgr 10, a corpora with over 4 billion tokens. Data preprocessing involved segmentation, lemmatization and mophosyntactic disambiguation with MWE annotation.

  14. Sales Data Presentation - Dashboards

    • kaggle.com
    zip
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Manidhar V (2023). Sales Data Presentation - Dashboards [Dataset]. https://www.kaggle.com/datasets/satyamanidharv/sales-data-presentation-dashboards
    Explore at:
    zip(763979 bytes)Available download formats
    Dataset updated
    Nov 29, 2023
    Authors
    Satya Manidhar V
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In today's data-driven world, extracting meaningful insights from vast amounts of information is crucial for informed decision-making. This presentation tackles the challenge of creating presentable data visualizations based on employee type and region of sales.

    Leveraging the power of PivotTables in Microsoft Excel, we will delve into a comprehensive approach to transforming raw sales data into compelling visual representations. By mastering PivotTable techniques, we will gain insights into employee sales trends, identify top performers, and uncover regional sales patterns.

  15. Controlled feature selection and compressive big data analytics:...

    • plos.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simeone Marino; Jiachen Xu; Yi Zhao; Nina Zhou; Yiwang Zhou; Ivo D. Dinov (2023). Controlled feature selection and compressive big data analytics: Applications to biomedical and health studies [Dataset]. http://doi.org/10.1371/journal.pone.0202674
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Simeone Marino; Jiachen Xu; Yi Zhao; Nina Zhou; Yiwang Zhou; Ivo D. Dinov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The theoretical foundations of Big Data Science are not fully developed, yet. This study proposes a new scalable framework for Big Data representation, high-throughput analytics (variable selection and noise reduction), and model-free inference. Specifically, we explore the core principles of distribution-free and model-agnostic methods for scientific inference based on Big Data sets. Compressive Big Data analytics (CBDA) iteratively generates random (sub)samples from a big and complex dataset. This subsampling with replacement is conducted on the feature and case levels and results in samples that are not necessarily consistent or congruent across iterations. The approach relies on an ensemble predictor where established model-based or model-free inference techniques are iteratively applied to preprocessed and harmonized samples. Repeating the subsampling and prediction steps many times, yields derived likelihoods, probabilities, or parameter estimates, which can be used to assess the algorithm reliability and accuracy of findings via bootstrapping methods, or to extract important features via controlled variable selection. CBDA provides a scalable algorithm for addressing some of the challenges associated with handling complex, incongruent, incomplete and multi-source data and analytics challenges. Albeit not fully developed yet, a CBDA mathematical framework will enable the study of the ergodic properties and the asymptotics of the specific statistical inference approaches via CBDA. We implemented the high-throughput CBDA method using pure R as well as via the graphical pipeline environment. To validate the technique, we used several simulated datasets as well as a real neuroimaging-genetics of Alzheimer’s disease case-study. The CBDA approach may be customized to provide generic representation of complex multimodal datasets and to provide stable scientific inference for large, incomplete, and multisource datasets.

  16. d

    Data from: Construction of symmetric group representation matrices and...

    • elsevier.digitalcommonsdata.com
    Updated Jan 1, 1981
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M.F. Soto (1981). Construction of symmetric group representation matrices and states [Dataset]. http://doi.org/10.17632/48cg7gf5sy.1
    Explore at:
    Dataset updated
    Jan 1, 1981
    Authors
    M.F. Soto
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Title of program: SYMRPMAT Catalogue Id: AAMF_v1_0

    Nature of problem To find the explicit representation matrices for every permutation, for every representation, for any symmetric group.

    Versions of this program held in the CPC repository in Mendeley Data aamf_v1_0; SYMRPMAT; 10.1016/0010-4655(81)90132-6

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  17. d

    Replication Data for: Candidate Supply is Not a Barrier to Immigrant...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dancygier, Rafaela; Lindgren, Karl-Oskar; Nyman, Pär; Vernby, Kåre (2023). Replication Data for: Candidate Supply is Not a Barrier to Immigrant Representation: A Case–Control Study [Dataset]. http://doi.org/10.7910/DVN/7GPSCA
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Dancygier, Rafaela; Lindgren, Karl-Oskar; Nyman, Pär; Vernby, Kåre
    Description

    Immigrants are underrepresented in most democratic parliaments. To explain the immigrant-native representation gap, existing research emphasizes party gatekeepers and structural conditions. But a more complete account must consider the possibility that the representation gap already begins at the supply stage. Are immigrants simply less interested in elected office? To test this explanation, we carried out an innovative case-control survey in Sweden. We surveyed elected politicians, candidates for local office, and residents who have not run, stratified these samples by immigrant status, and linked all respondents to local political opportunity structures. We find that differences in political ambition, interest, and efficacy do not help explain immigrants’ underrepresentation. Instead, the major hurdles lie in securing a candidate nomination and being placed on an electable list position. We conclude that there is a sufficient supply of potential immigrant candidates, but immigrants' ambition is thwarted by political elites.

  18. d

    Replication data for: Multidimensional Representation

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolkenstein, Fabio; Wratil, Christopher (2023). Replication data for: Multidimensional Representation [Dataset]. http://doi.org/10.7910/DVN/79RI4R
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Wolkenstein, Fabio; Wratil, Christopher
    Description

    The study of representation is a major research field in quantitative political science. Since the early 2000s, it has been accompanied by a range of important conceptual innovations by political theorists working on the topic. Yet, although many quantitative scholars are familiar with the conceptual literature, even the most complex quantitative studies eschew engaging with the “new wave” of more sophisticated concepts of representation that theorists have developed. We discuss what we take to be the main reasons for this gap between theory and empirics, and present four novel conceptions of representation that are both sensitive to theorists’ conceptual impulses and operationalizable for quantitative scholars. In doing so, we advance an alternative research agenda on representation that moves significantly beyond the status quo of the field.

  19. p

    Data from: Simplicity of K-means versus deepness of Deep Learning. A Case of...

    • purr.purdue.edu
    Updated Oct 7, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    murat dundar; Qiang Kou; Baichuan Zhang; Yicheng He; Bartlomiej Rajwa (2016). Simplicity of K-means versus deepness of Deep Learning. A Case of Unsupervised Feature Learning with Limited Data [Dataset]. http://doi.org/10.4231/R7N58J9Z
    Explore at:
    Dataset updated
    Oct 7, 2016
    Dataset provided by
    PURR
    Authors
    murat dundar; Qiang Kou; Baichuan Zhang; Yicheng He; Bartlomiej Rajwa
    Description

    A study contrasting K-means-based unsupervised feature learning and deep learning techniques for small data sets with limited intra- as well as inter-class diversity

  20. F

    Data from: Data-based System Representation and Synchronization for...

    • data.uni-hannover.de
    zip
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institut für Regelungstechnik (2024). Data-based System Representation and Synchronization for Multiagent Systems [Dataset]. https://data.uni-hannover.de/dataset/data-based-system-representation-and-synchronization-for-multiagent-systems
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Institut für Regelungstechnik
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    [*] Victor G. Lopez and Matthias A. Müller, "Data-based System Representation and Synchronization for Multiagent Systems". Proceedings of the IEEE Conference on Decision and Control 2024.

    These codes simulate the solutions proposed in [*] for the data-based synchronization problem of homogeneous and heterogeneous agents. The file DBsynchronization_homogeneous.m corresponds to the method described in Section III for a multiagent system with one leader and three followers. This example was not shown in [*] for space reasons. The file DBsynchronization_heterogeneous.m uses the method in Section IV for a multiagent system with one leader and four followers, as shown in the paper.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev (2023). Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures [Dataset]. http://doi.org/10.1021/ac802532j.s002

Data from: Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

A recent proteomics-grade (95%+ sequence reliability) high-throughput de novo sequencing method utilizes the benefits of high resolution, high mass accuracy, and the use of two complementary fragmentation techniques collision-activated dissociation (CAD) and electron capture dissociation (ECD). With this high-fidelity sequencing approach, hundreds of peptides can be sequenced de novo in a single LC−MS/MS experiment. The high productivity of the new analysis technique has revealed a new bottleneck which occurs in data representation. Here we suggest a new method of data analysis and visualization that presents a comprehensive picture of the peptide content including relative abundances and grouping into families. The 2D mass mapping consists of putting the molecular masses onto a two-dimensional bubble plot, with the relative monoisotopic mass defect and isotopic shift being the axes and with the bubble area proportional to the peptide abundance. Peptides belonging to the same family form a compact group on such a plot, so that the family identity can in many cases be determined from the molecular mass alone. The performance of the method is demonstrated on the high-throughput analysis of skin secretion from three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. Two dimensional mass maps simplify the task of global comparison between the species and make obvious the similarities and differences in the peptide contents that are obscure in traditional data presentation methods. Even biological activity of the peptide can sometimes be inferred from its position on the plot. Two dimensional mass mapping is a general method applicable to any complex mixture, peptide and nonpeptide alike.

Search
Clear search
Close search
Google apps
Main menu