100+ datasets found

f
Data from: Two Dimensional Mass Mapping as a General Method of Data...
acs.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev (2023). Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures [Dataset]. http://doi.org/10.1021/ac802532j.s002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1021/ac802532j.s002
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A recent proteomics-grade (95%+ sequence reliability) high-throughput de novo sequencing method utilizes the benefits of high resolution, high mass accuracy, and the use of two complementary fragmentation techniques collision-activated dissociation (CAD) and electron capture dissociation (ECD). With this high-fidelity sequencing approach, hundreds of peptides can be sequenced de novo in a single LC−MS/MS experiment. The high productivity of the new analysis technique has revealed a new bottleneck which occurs in data representation. Here we suggest a new method of data analysis and visualization that presents a comprehensive picture of the peptide content including relative abundances and grouping into families. The 2D mass mapping consists of putting the molecular masses onto a two-dimensional bubble plot, with the relative monoisotopic mass defect and isotopic shift being the axes and with the bubble area proportional to the peptide abundance. Peptides belonging to the same family form a compact group on such a plot, so that the family identity can in many cases be determined from the molecular mass alone. The performance of the method is demonstrated on the high-throughput analysis of skin secretion from three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. Two dimensional mass maps simplify the task of global comparison between the species and make obvious the similarities and differences in the peptide contents that are obscure in traditional data presentation methods. Even biological activity of the peptide can sometimes be inferred from its position on the plot. Two dimensional mass mapping is a general method applicable to any complex mixture, peptide and nonpeptide alike.
f
Data from: Data_Sheet_1_An Active Data Representation of Videos for...
frontiersin.figshare.com
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz (2023). Data_Sheet_1_An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation.PDF [Dataset]. http://doi.org/10.3389/fcomp.2020.00001.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fcomp.2020.00001.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioral cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience, and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.
i
Bioinformatics Applications using a Multi-layered Data Representation
ieee-dataport.org
Updated May 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diogo Vieira (2025). Bioinformatics Applications using a Multi-layered Data Representation [Dataset]. https://ieee-dataport.org/documents/bioinformatics-applications-using-multi-layered-data-representation
Explore at:
Dataset updated
May 25, 2025
Authors
Diogo Vieira
Description
Data for all phases of new method and results. This is the last and published version of results.
H
Supplementary Materials for A Linked Data Representation for Summary...
dataverse.harvard.edu
dataone.org
Updated Aug 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James McCusker (2019). Supplementary Materials for A Linked Data Representation for Summary Statistics and Grouping Criteria [Dataset]. http://doi.org/10.7910/DVN/OK0BUG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/OK0BUG
Dataset updated
Aug 28, 2019
Dataset provided by
Harvard Dataverse
Authors
James McCusker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium's provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute.
e
Classification & presentation of Data
paper.erudition.co.in
html
Updated Nov 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Classification & presentation of Data [Dataset]. https://paper.erudition.co.in/makaut/bachelor-in-business-administration-2020-2021/3/business-research-methods
Explore at:
htmlAvailable download formats
Dataset updated
Nov 23, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Classification & presentation of Data of Business Research Methods, 3rd Semester , Bachelor in Business Administration 2020 - 2021
d
Data from: Construction of symmetric group representation matrices and...
elsevier.digitalcommonsdata.com
Updated Jan 1, 1981
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M.F. Soto (1981). Construction of symmetric group representation matrices and states [Dataset]. http://doi.org/10.17632/v4dcf4sf8z.1
Explore at:
Unique identifier
https://doi.org/10.17632/v4dcf4sf8z.1
Dataset updated
Jan 1, 1981
Authors
M.F. Soto
License
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Description
Title of program: SYMSTATS Catalogue Id: AAME_v1_0

Nature of problem To find explicitly basis states and basis state operators for the symmetric groups, in a form applicable to any symmetric group and most useful for applications.

Versions of this program held in the CPC repository in Mendeley Data aame_v1_0; SYMSTATS; 10.1016/0010-4655(81)90132-6

This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
e
Diagrammatic and Graphical representation of Numerical Data
paper.erudition.co.in
html
Updated Jun 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2021). Diagrammatic and Graphical representation of Numerical Data [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2020-2021/5/numerical-and-statistical-methods
Explore at:
htmlAvailable download formats
Dataset updated
Jun 1, 2021
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Diagrammatic and Graphical representation of Numerical Data of Numerical and statistical Methods, 5th Semester , Bachelor of Computer Application 2020-2021
f
A number of possible methods that can be used given the specific...
datasetcatalog.nlm.nih.gov
Updated Nov 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Smilde, Age K.; de Rooi, Johan; Bønnelykke, Klaus; Bisgaard, Hans; Nørgaard, Sarah K.; Rasmussen, Morten A. (2018). A number of possible methods that can be used given the specific representation of the raw data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000636829
Explore at:
Dataset updated
Nov 29, 2018
Authors
Smilde, Age K.; de Rooi, Johan; Bønnelykke, Klaus; Bisgaard, Hans; Nørgaard, Sarah K.; Rasmussen, Morten A.
Description
Methods discussed in the text are printed in bold type.
n
Data from: Multimodal Learning on Graphs: Methods and Applications
curate.nd.edu
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yihong Ma (2025). Multimodal Learning on Graphs: Methods and Applications [Dataset]. http://doi.org/10.7274/28792454.v1
Explore at:
Unique identifier
https://doi.org/10.7274/28792454.v1
Dataset updated
May 14, 2025
Dataset provided by
University of Notre Dame
Authors
Yihong Ma
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Graph data represents complex relationships across diverse domains, from social networks to healthcare and chemical sciences. However, real-world graph data often spans multiple modalities, including time-varying signals from sensors, semantic information from textual representations, and domain-specific encodings. This dissertation introduces innovative multimodal learning techniques for graph-based predictive modeling, addressing the intricate nature of these multidimensional data representations. The research systematically advances graph learning through innovative methodological approaches across three critical modalities. Initially, we establish robust graph-based methodological foundations through advanced techniques including prompt tuning for heterogeneous graphs and a comprehensive framework for imbalanced learning on graph data. we then extend these methods to time series analysis, demonstrating their practical utility through applications such as hierarchical spatio-temporal modeling for COVID-19 forecasting and graph-based density estimation for anomaly detection in unmanned aerial systems. Finally, we explore textual representations of graphs in the chemical domain, reformulating reaction yield prediction as an imbalanced regression problem to enhance performance in underrepresented high-yield regions critical to chemists.
n
Data from: New Deep Learning Methods for Medical Image Analysis and...
curate.nd.edu
pdf
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pengfei Gu (2024). New Deep Learning Methods for Medical Image Analysis and Scientific Data Generation and Compression [Dataset]. http://doi.org/10.7274/26156719.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/26156719.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Notre Dame
Authors
Pengfei Gu
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Medical image analysis is critical to biological studies, health research, computer- aided diagnoses, and clinical applications. Recently, deep learning (DL) techniques have achieved remarkable successes in medical image analysis applications. However, these techniques typically require large amounts of annotations to achieve satisfactory performance. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for medical image analysis while reducing annotation efforts? To address this problem, we have outlined two specific aims: (A1) Utilize existing annotations effectively from advanced models; (A2) extract generic knowledge directly from unannotated images.

To achieve the aim (A1): First, we introduce a new data representation called TopoImages, which encodes the local topology of all the image pixels. TopoImages can be complemented with the original images to improve medical image analysis tasks. Second, we propose a new augmentation method, SAMAug-C, that lever- ages the Segment Anything Model (SAM) to augment raw image input and enhance medical image classification. Third, we propose two advanced DL architectures, kCBAC-Net and ConvFormer, to enhance the performance of 2D and 3D medical image segmentation. We also present a gate-regularized network training (GrNT) approach to improve multi-scale fusion in medical image segmentation. To achieve the aim (A2), we propose a novel extension of known Masked Autoencoders (MAEs) for self pre-training, i.e., models pre-trained on the same target dataset, specifically for 3D medical image segmentation.

Scientific visualization is a powerful approach for understanding and analyzing various physical or natural phenomena, such as climate change or chemical reactions. However, the cost of scientific simulations is high when factors like time, ensemble, and multivariate analyses are involved. Additionally, scientists can only afford to sparsely store the simulation outputs (e.g., scalar field data) or visual representations (e.g., streamlines) or visualization images due to limited I/O bandwidths and storage space. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for scientific data generation and compression while reducing simulation and storage costs?

To tackle this problem: First, we propose a DL framework that generates un- steady vector fields data from a set of streamlines. Based on this method, domain scientists only need to store representative streamlines at simulation time and recon- struct vector fields during post-processing. Second, we design a novel DL method that translates scalar fields to vector fields. Using this approach, domain scientists only need to store scalar field data at simulation time and generate vector fields from their scalar field counterparts afterward. Third, we present a new DL approach that compresses a large collection of visualization images generated from time-varying data for communicating volume visualization results.
H
Replication Data for: Reform and Representation: A New Method Applied to...
dataverse.harvard.edu
Updated Sep 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Shor; Thad Kousser; Justin Phillips (2016). Replication Data for: Reform and Representation: A New Method Applied to Recent Electoral Changes [Dataset]. http://doi.org/10.7910/DVN/5AYBI9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/5AYBI9
Dataset updated
Sep 20, 2016
Dataset provided by
Harvard Dataverse
Authors
Boris Shor; Thad Kousser; Justin Phillips
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Can electoral reforms such as an independent redistricting commission and the top-two primary create conditions that lead to better legislative representation? We explore this question by presenting a new method for measuring a key indicator of representation – the congruence between a legislator's ideological position and the average position of her district's voters. Our novel approach combines two methods: the joint classification of voters and political candidates on the same ideological scale, along with multilevel regression and post-stratification to estimate the position of the average voter across many districts in multiple elections. After validating our approach, we use it to study the recent impact of reforms in California, showing that they did not bring their hoped-for effects.
z
Classification of web-based Digital Humanities projects leveraging...
zenodo.org
data-staging.niaid.nih.gov
csv, tsv
Updated Nov 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommaso Battisti; Tommaso Battisti (2025). Classification of web-based Digital Humanities projects leveraging information visualisation techniques [Dataset]. http://doi.org/10.5281/zenodo.14192758
Explore at:
tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14192758
Dataset updated
Nov 10, 2025
Dataset provided by
Zenodo
Authors
Tommaso Battisti; Tommaso Battisti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This dataset contains a list of 186 Digital Humanities projects leveraging information visualisation techniques. Each project has been classified according to visualisation and interaction methods, narrativity and narrative solutions, domain, methods for the representation of uncertainty and interpretation, and the employment of critical and custom approaches to visually represent humanities data.

Classification schema: categories and columns

The project_id column contains unique internal identifiers assigned to each project. Meanwhile, the last_access column records the most recent date (in DD/MM/YYYY format) on which each project was reviewed based on the web address specified in the url column.
The remaining columns can be grouped into descriptive categories aimed at characterising projects according to different aspects:

Narrativity. It reports the presence of information visualisation techniques employed within narrative structures. Here, the term narrative encompasses both author-driven linear data stories and more user-directed experiences where the narrative sequence is determined by user exploration [1]. We define 2 columns to identify projects using visualisation techniques in narrative, or non-narrative sections. Both conditions can be true for projects employing visualisations in both contexts. Columns:

non_narrative (boolean)

narrative (boolean)

Domain. The humanities domain to which the project is related. We rely on [2] and the chapters of the first part of [3] to abstract a set of general domains. Column:

domain (categorical):

History and archaeology

Art and art history

Language and literature

Music and musicology

Multimedia and performing arts

Philosophy and religion

Other: both extra-list domains and cases of collections without a unique or specific thematic focus.

Visualisation of uncertainty and interpretation. Buiding upon the frameworks proposed by [4] and [5], a set of categories was identified, highlighting a distinction between precise and impressional communication of uncertainty. Precise methods explicitly represent quantifiable uncertainty such as missing, unknown, or uncertain data, precisely locating and categorising it using visual variables and positioning. Two sub-categories are interactive distinction, when uncertain data is not visually distinguishable from the rest of the data but can be dynamically isolated or included/excluded categorically through interaction techniques (usually filters); and visual distinction, when uncertainty visually “emerges” from the representation by means of dedicated glyphs and spatial or visual cues and variables. On the other hand, impressional methods communicate the constructed and situated nature of data [6], exposing the interpretative layer of the visualisation and indicating more abstract and unquantifiable uncertainty using graphical aids or interpretative metrics. Two sub-categories are: ambiguation, when the use of graphical expedients—like permeable glyph boundaries or broken lines—visually convey the ambiguity of a phenomenon; and interpretative metrics, when expressive, non-scientific, or non-punctual metrics are used to build a visualisation. Column:

uncertainty_interpretation (categorical):

Interactive distinction

Visual distinction

Ambiguation

Interpretative metrics

Critical adaptation. We identify projects in which, with regards to at least a visualisation, the following criteria are fulfilled: 1) avoid repurposing of prepackaged, generic-use, or ready-made solutions; 2) being tailored and unique to reflect the peculiarities of the phenomena at hand; 3) avoid simplifications to embrace and depict complexity, promoting time-consuming visualisation-based inquiry. Column:

critical_adaptation (boolean)

Non-temporal visualisation techniques. We adopt and partially adapt the terminology and definitions from [7]. A column is defined for each type of visualisation and accounts for its presence within a project, also including stacked layouts and more complex variations. Columns and inclusion criteria:

plot (boolean): visual representations that map data points onto a two-dimensional coordinate system.

cluster_or_set (boolean): sets or cluster-based visualisations used to unveil possible inter-object similarities.

map (boolean): geographical maps used to show spatial insights. While we do not specify the variants of maps (e.g., pin maps, dot density maps, flow maps, etc.), we make an exception for maps where each data point is represented by another visualisation (e.g., a map where each data point is a pie chart) by accounting for the presence of both in their respective columns.

network (boolean): visual representations highlighting relational aspects through nodes connected by links or edges.

hierarchical_diagram (boolean): tree-like structures such as tree diagrams, radial trees, but also dendrograms. They differ from networks for their strictly hierarchical structure and absence of closed connection loops.

treemap (boolean): still hierarchical, but highlighting quantities expressed by means of area size. It also includes circle packing variants.

word_cloud (boolean): clouds of words, where each instance’s size is proportional to its frequency in a related context

bars (boolean): includes bar charts, histograms, and variants. It coincides with “bar charts” in [7] but with a more generic term to refer to all bar-based visualisations.

line_chart (boolean): the display of information as sequential data points connected by straight-line segments.

area_chart (boolean): similar to a line chart but with a filled area below the segments. It also includes density plots.

pie_chart (boolean): circular graphs divided into slices which can also use multi-level solutions.

plot_3d (boolean): plots that use a third dimension to encode an additional variable.

proportional_area (boolean): representations used to compare values through area size. Typically, using circle- or square-like shapes.

other (boolean): it includes all other types of non-temporal visualisations that do not fall into the aforementioned categories.

Temporal visualisations and encodings. In addition to non-temporal visualisations, a group of techniques to encode temporality is considered in order to enable comparisons with [7]. Columns:

timeline (boolean): the display of a list of data points or spans in chronological order. They include timelines working either with a scale or simply displaying events in sequence. As in [7], we also include structured solutions resembling Gantt chart layouts.

temporal_dimension (boolean): to report when time is mapped to any dimension of a visualisation, with the exclusion of timelines. We use the term “dimension” and not “axis” as in [7] as more appropriate for radial layouts or more complex representational choices.

animation (boolean): temporality is perceived through an animation changing the visualisation according to time flow.

visual_variable (boolean): another visual encoding strategy is used to represent any temporality-related variable (e.g., colour).

Interactions. A set of categories to assess affordable interactions based on the concept of user intent [8] and user-allowed perceptualisation data actions [9]. The following categories roughly match the manipulative subset of methods of the “how” an interaction is performed in the conception of [10]. Only interactions that affect the aspect of the visualisation or the visual representation of its data points, symbols, and glyphs are taken into consideration. Columns:

basic_selection (boolean): the demarcation of an element either for the duration of the interaction or more permanently until the occurrence of another selection.

advanced_selection (boolean): the demarcation involves both the selected element and connected elements within the visualisation or leads to brush and link effects across views. Basic selection is tacitly implied.

navigation (boolean): interactions that allow moving, zooming, panning, rotating, and scrolling the view but only when applied to the visualisation and not to the web page. It also includes “drill” interactions (to navigate through different levels or portions of data detail, often generating a new view that replaces or accompanies the original) and “expand” interactions generating new perspectives on data by expanding and collapsing nodes.

arrangement (boolean): the organisation of visualisation elements (symbols, glyphs, etc.) or multi-visualisation layouts spatially through drag and drop or
E
Data from: Vector representations of polish words (Word2Vec method)
live.european-language-grid.eu
binary format
Updated Nov 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Vector representations of polish words (Word2Vec method) [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20234
Explore at:
binary formatAvailable download formats
Dataset updated
Nov 6, 2016
License
https://www.gnu.org/licenses/lgpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/lgpl-3.0-standalone.html
Description
Model skip gram with vectors of length 100. Trained on kgr 10, a corpora with over 4 billion tokens. Data preprocessing involved segmentation, lemmatization and mophosyntactic disambiguation with MWE annotation.
Sales Data Presentation - Dashboards
kaggle.com
zip
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satya Manidhar V (2023). Sales Data Presentation - Dashboards [Dataset]. https://www.kaggle.com/datasets/satyamanidharv/sales-data-presentation-dashboards
Explore at:
zip(763979 bytes)Available download formats
Dataset updated
Nov 29, 2023
Authors
Satya Manidhar V
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
In today's data-driven world, extracting meaningful insights from vast amounts of information is crucial for informed decision-making. This presentation tackles the challenge of creating presentable data visualizations based on employee type and region of sales.

Leveraging the power of PivotTables in Microsoft Excel, we will delve into a comprehensive approach to transforming raw sales data into compelling visual representations. By mastering PivotTable techniques, we will gain insights into employee sales trends, identify top performers, and uncover regional sales patterns.
Controlled feature selection and compressive big data analytics:...
plos.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simeone Marino; Jiachen Xu; Yi Zhao; Nina Zhou; Yiwang Zhou; Ivo D. Dinov (2023). Controlled feature selection and compressive big data analytics: Applications to biomedical and health studies [Dataset]. http://doi.org/10.1371/journal.pone.0202674
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0202674
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Simeone Marino; Jiachen Xu; Yi Zhao; Nina Zhou; Yiwang Zhou; Ivo D. Dinov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The theoretical foundations of Big Data Science are not fully developed, yet. This study proposes a new scalable framework for Big Data representation, high-throughput analytics (variable selection and noise reduction), and model-free inference. Specifically, we explore the core principles of distribution-free and model-agnostic methods for scientific inference based on Big Data sets. Compressive Big Data analytics (CBDA) iteratively generates random (sub)samples from a big and complex dataset. This subsampling with replacement is conducted on the feature and case levels and results in samples that are not necessarily consistent or congruent across iterations. The approach relies on an ensemble predictor where established model-based or model-free inference techniques are iteratively applied to preprocessed and harmonized samples. Repeating the subsampling and prediction steps many times, yields derived likelihoods, probabilities, or parameter estimates, which can be used to assess the algorithm reliability and accuracy of findings via bootstrapping methods, or to extract important features via controlled variable selection. CBDA provides a scalable algorithm for addressing some of the challenges associated with handling complex, incongruent, incomplete and multi-source data and analytics challenges. Albeit not fully developed yet, a CBDA mathematical framework will enable the study of the ergodic properties and the asymptotics of the specific statistical inference approaches via CBDA. We implemented the high-throughput CBDA method using pure R as well as via the graphical pipeline environment. To validate the technique, we used several simulated datasets as well as a real neuroimaging-genetics of Alzheimer’s disease case-study. The CBDA approach may be customized to provide generic representation of complex multimodal datasets and to provide stable scientific inference for large, incomplete, and multisource datasets.
d
Data from: Construction of symmetric group representation matrices and...
elsevier.digitalcommonsdata.com
Updated Jan 1, 1981
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M.F. Soto (1981). Construction of symmetric group representation matrices and states [Dataset]. http://doi.org/10.17632/48cg7gf5sy.1
Explore at:
Unique identifier
https://doi.org/10.17632/48cg7gf5sy.1
Dataset updated
Jan 1, 1981
Authors
M.F. Soto
License
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Description
Title of program: SYMRPMAT Catalogue Id: AAMF_v1_0

Nature of problem To find the explicit representation matrices for every permutation, for every representation, for any symmetric group.

Versions of this program held in the CPC repository in Mendeley Data aamf_v1_0; SYMRPMAT; 10.1016/0010-4655(81)90132-6

This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
d
Replication Data for: Candidate Supply is Not a Barrier to Immigrant...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dancygier, Rafaela; Lindgren, Karl-Oskar; Nyman, Pär; Vernby, Kåre (2023). Replication Data for: Candidate Supply is Not a Barrier to Immigrant Representation: A Case–Control Study [Dataset]. http://doi.org/10.7910/DVN/7GPSCA
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/7GPSCA
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Dancygier, Rafaela; Lindgren, Karl-Oskar; Nyman, Pär; Vernby, Kåre
Description
Immigrants are underrepresented in most democratic parliaments. To explain the immigrant-native representation gap, existing research emphasizes party gatekeepers and structural conditions. But a more complete account must consider the possibility that the representation gap already begins at the supply stage. Are immigrants simply less interested in elected office? To test this explanation, we carried out an innovative case-control survey in Sweden. We surveyed elected politicians, candidates for local office, and residents who have not run, stratified these samples by immigrant status, and linked all respondents to local political opportunity structures. We find that differences in political ambition, interest, and efficacy do not help explain immigrants’ underrepresentation. Instead, the major hurdles lie in securing a candidate nomination and being placed on an electable list position. We conclude that there is a sufficient supply of potential immigrant candidates, but immigrants' ambition is thwarted by political elites.
d
Replication data for: Multidimensional Representation
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wolkenstein, Fabio; Wratil, Christopher (2023). Replication data for: Multidimensional Representation [Dataset]. http://doi.org/10.7910/DVN/79RI4R
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/79RI4R
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Wolkenstein, Fabio; Wratil, Christopher
Description
The study of representation is a major research field in quantitative political science. Since the early 2000s, it has been accompanied by a range of important conceptual innovations by political theorists working on the topic. Yet, although many quantitative scholars are familiar with the conceptual literature, even the most complex quantitative studies eschew engaging with the “new wave” of more sophisticated concepts of representation that theorists have developed. We discuss what we take to be the main reasons for this gap between theory and empirics, and present four novel conceptions of representation that are both sensitive to theorists’ conceptual impulses and operationalizable for quantitative scholars. In doing so, we advance an alternative research agenda on representation that moves significantly beyond the status quo of the field.
p
Data from: Simplicity of K-means versus deepness of Deep Learning. A Case of...
purr.purdue.edu
Updated Oct 7, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
murat dundar; Qiang Kou; Baichuan Zhang; Yicheng He; Bartlomiej Rajwa (2016). Simplicity of K-means versus deepness of Deep Learning. A Case of Unsupervised Feature Learning with Limited Data [Dataset]. http://doi.org/10.4231/R7N58J9Z
Explore at:
Unique identifier
https://doi.org/10.4231/R7N58J9Z
Dataset updated
Oct 7, 2016
Dataset provided by
PURR
Authors
murat dundar; Qiang Kou; Baichuan Zhang; Yicheng He; Bartlomiej Rajwa
Description
A study contrasting K-means-based unsupervised feature learning and deep learning techniques for small data sets with limited intra- as well as inter-class diversity
F
Data from: Data-based System Representation and Synchronization for...
data.uni-hannover.de
zip
Updated Aug 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Regelungstechnik (2024). Data-based System Representation and Synchronization for Multiagent Systems [Dataset]. https://data.uni-hannover.de/dataset/data-based-system-representation-and-synchronization-for-multiagent-systems
Explore at:
zipAvailable download formats
Dataset updated
Aug 7, 2024
Dataset authored and provided by
Institut für Regelungstechnik
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
[*] Victor G. Lopez and Matthias A. Müller, "Data-based System Representation and Synchronization for Multiagent Systems". Proceedings of the IEEE Conference on Decision and Control 2024.

These codes simulate the solutions proposed in [*] for the data-based synchronization problem of homogeneous and heterogeneous agents. The file DBsynchronization_homogeneous.m corresponds to the method described in Section III for a multiagent system with one leader and three followers. This example was not shown in [*] for space reasons. The file DBsynchronization_heterogeneous.m uses the method in Section IV for a multiagent system with one leader and four followers, as shown in the paper.

Facebook

Twitter

Click to copy link

Link copied

Cite

Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev (2023). Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures [Dataset]. http://doi.org/10.1021/ac802532j.s002

Data from: Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1021/ac802532j.s002

Dataset updated

Jun 1, 2023

Dataset provided by

ACS Publications

Authors

Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

A recent proteomics-grade (95%+ sequence reliability) high-throughput de novo sequencing method utilizes the benefits of high resolution, high mass accuracy, and the use of two complementary fragmentation techniques collision-activated dissociation (CAD) and electron capture dissociation (ECD). With this high-fidelity sequencing approach, hundreds of peptides can be sequenced de novo in a single LC−MS/MS experiment. The high productivity of the new analysis technique has revealed a new bottleneck which occurs in data representation. Here we suggest a new method of data analysis and visualization that presents a comprehensive picture of the peptide content including relative abundances and grouping into families. The 2D mass mapping consists of putting the molecular masses onto a two-dimensional bubble plot, with the relative monoisotopic mass defect and isotopic shift being the axes and with the bubble area proportional to the peptide abundance. Peptides belonging to the same family form a compact group on such a plot, so that the family identity can in many cases be determined from the molecular mass alone. The performance of the method is demonstrated on the high-throughput analysis of skin secretion from three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. Two dimensional mass maps simplify the task of global comparison between the species and make obvious the similarities and differences in the peptide contents that are obscure in traditional data presentation methods. Even biological activity of the peptide can sometimes be inferred from its position on the plot. Two dimensional mass mapping is a general method applicable to any complex mixture, peptide and nonpeptide alike.

Clear search

Close search

Google apps

Main menu

Data from: Two Dimensional Mass Mapping as a General Method of Data...

Data from: Data_Sheet_1_An Active Data Representation of Videos for...

Bioinformatics Applications using a Multi-layered Data Representation

Supplementary Materials for A Linked Data Representation for Summary...

Classification & presentation of Data

Data from: Construction of symmetric group representation matrices and...

Diagrammatic and Graphical representation of Numerical Data

A number of possible methods that can be used given the specific...

Data from: Multimodal Learning on Graphs: Methods and Applications

Data from: New Deep Learning Methods for Medical Image Analysis and...

Replication Data for: Reform and Representation: A New Method Applied to...

Classification of web-based Digital Humanities projects leveraging...

Description

Classification schema: categories and columns

Data from: Vector representations of polish words (Word2Vec method)

Sales Data Presentation - Dashboards

In today's data-driven world, extracting meaningful insights from vast amounts of information is crucial for informed decision-making. This presentation tackles the challenge of creating presentable data visualizations based on employee type and region of sales.

Controlled feature selection and compressive big data analytics:...

Data from: Construction of symmetric group representation matrices and...

Replication Data for: Candidate Supply is Not a Barrier to Immigrant...

Replication data for: Multidimensional Representation

Data from: Simplicity of K-means versus deepness of Deep Learning. A Case of...

Data from: Data-based System Representation and Synchronization for...

Data from: Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures