Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioral cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience, and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A recent proteomics-grade (95%+ sequence reliability) high-throughput de novo sequencing method utilizes the benefits of high resolution, high mass accuracy, and the use of two complementary fragmentation techniques collision-activated dissociation (CAD) and electron capture dissociation (ECD). With this high-fidelity sequencing approach, hundreds of peptides can be sequenced de novo in a single LC−MS/MS experiment. The high productivity of the new analysis technique has revealed a new bottleneck which occurs in data representation. Here we suggest a new method of data analysis and visualization that presents a comprehensive picture of the peptide content including relative abundances and grouping into families. The 2D mass mapping consists of putting the molecular masses onto a two-dimensional bubble plot, with the relative monoisotopic mass defect and isotopic shift being the axes and with the bubble area proportional to the peptide abundance. Peptides belonging to the same family form a compact group on such a plot, so that the family identity can in many cases be determined from the molecular mass alone. The performance of the method is demonstrated on the high-throughput analysis of skin secretion from three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. Two dimensional mass maps simplify the task of global comparison between the species and make obvious the similarities and differences in the peptide contents that are obscure in traditional data presentation methods. Even biological activity of the peptide can sometimes be inferred from its position on the plot. Two dimensional mass mapping is a general method applicable to any complex mixture, peptide and nonpeptide alike.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this paper, I use joint scaling methods and similar items from three large-scale surveys to place voters, parties and politicians from different Latin American countries on a common ideological space. The findings reveal that ideology is a significant determinant of vote choice in Latin America. They also suggest that the success of leftist leaders at the polls reflects the views of the voters sustaining their victories. The location of parties and leaders reveal that three distinctive clusters exist: one located at the left of the political spectrum, another at the center, and a third on the right. The results also indicate that legislators in Brazil, Mexico and Peru tend to be more "leftists" than their voters. The ideological drift, however, is not signicant enough to substantiate the view that a disconnect between voters and politicians lies behind the success of leftist presidents in these countries. These findings highlight the importance of using a common-space scale to compare disparate populations and call into question a number of recent studies by scholars of Latin American politics who fail to adequately address this important issue.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The city of Austin has administered a community survey for the 2015, 2016, 2017, 2018 and 2019 years (https://data.austintexas.gov/City-Government/Community-Survey/s2py-ceb7), to “assess satisfaction with the delivery of the major City Services and to help determine priorities for the community as part of the City’s ongoing planning process.” To directly access this dataset from the city of Austin’s website, you can follow this link https://cutt.ly/VNqq5Kd. Although we downloaded the dataset analyzed in this study from the former link, given that the city of Austin is interested in continuing administering this survey, there is a chance that the data we used for this analysis and the data hosted in the city of Austin’s website may differ in the following years. Accordingly, to ensure the replication of our findings, we recommend researchers to download and analyze the dataset we employed in our analyses, which can be accessed at the following link https://github.com/democratizing-data-science/MDCOR/blob/main/Community_Survey.csv. Replication Features or Variables The community survey data has 10,684 rows and 251 columns. Of these columns, our analyses will rely on the following three indicators that are taken verbatim from the survey: “ID”, “Q25 - If there was one thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?", and “Do you own or rent your home?”
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Medical image analysis is critical to biological studies, health research, computer- aided diagnoses, and clinical applications. Recently, deep learning (DL) techniques have achieved remarkable successes in medical image analysis applications. However, these techniques typically require large amounts of annotations to achieve satisfactory performance. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for medical image analysis while reducing annotation efforts? To address this problem, we have outlined two specific aims: (A1) Utilize existing annotations effectively from advanced models; (A2) extract generic knowledge directly from unannotated images.
To achieve the aim (A1): First, we introduce a new data representation called TopoImages, which encodes the local topology of all the image pixels. TopoImages can be complemented with the original images to improve medical image analysis tasks. Second, we propose a new augmentation method, SAMAug-C, that lever- ages the Segment Anything Model (SAM) to augment raw image input and enhance medical image classification. Third, we propose two advanced DL architectures, kCBAC-Net and ConvFormer, to enhance the performance of 2D and 3D medical image segmentation. We also present a gate-regularized network training (GrNT) approach to improve multi-scale fusion in medical image segmentation. To achieve the aim (A2), we propose a novel extension of known Masked Autoencoders (MAEs) for self pre-training, i.e., models pre-trained on the same target dataset, specifically for 3D medical image segmentation.
Scientific visualization is a powerful approach for understanding and analyzing various physical or natural phenomena, such as climate change or chemical reactions. However, the cost of scientific simulations is high when factors like time, ensemble, and multivariate analyses are involved. Additionally, scientists can only afford to sparsely store the simulation outputs (e.g., scalar field data) or visual representations (e.g., streamlines) or visualization images due to limited I/O bandwidths and storage space. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for scientific data generation and compression while reducing simulation and storage costs?
To tackle this problem: First, we propose a DL framework that generates un- steady vector fields data from a set of streamlines. Based on this method, domain scientists only need to store representative streamlines at simulation time and recon- struct vector fields during post-processing. Second, we design a novel DL method that translates scalar fields to vector fields. Using this approach, domain scientists only need to store scalar field data at simulation time and generate vector fields from their scalar field counterparts afterward. Third, we present a new DL approach that compresses a large collection of visualization images generated from time-varying data for communicating volume visualization results.
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Title of program: MATTAB Catalogue Id: AAMH_v1_0
Nature of problem To check that the representation matrices produced by SYMRPMAT satisfy the group table produced by SYMGRPTB.
Versions of this program held in the CPC repository in Mendeley Data aamh_v1_0; MATTAB; 10.1016/0010-4655(81)90132-6
This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Abstract A fast algorithm to compute irreducible integer representations of the symmetric group is described. The representation is called tilted because the identity is not represented by a unit matrix, but a matrix β satisfying a reduced characteristic equation of the form (β - I)^k= 0. A distinctive feature of the approach is that the non-zero matrix elements are restricted to ±1. A so called natural representation is obtained by multiplying each representation matrix by β^(-1). Alternatively t...
Title of program: TMRP Catalogue Id: ADBC_v1_0
Nature of problem Irreducible integer representations of the permutation group are computed.
Versions of this program held in the CPC repository in Mendeley Data ADBC_v1_0; TMRP; 10.1016/0010-4655(95)00009-5
This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
This dataset was created by kirtida dalvi
This dataset was created by M. Raza Siddique
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Tables and charts have long been seen as effective ways to convey data. Much attention has been focused on improving charts, following ideas of human perception and brain function. Tables can also be viewed as two-dimensional representations of data, yet it is only fairly recently that we have begun to apply principles of design that aid the communication of information between the author and reader. In this study, we collated guidelines for the design of data and statistical tables. These guidelines fall under three principles: aiding comparisons, reducing visual clutter, and increasing readability. We surveyed tables published in recent issues of 43 journals in the fields of ecology and evolutionary biology for their adherence to these three principles, as well as author guidelines on journal publisher websites. We found that most of the over 1,000 tables we sampled had no heavy grid lines and little visual clutter. They were also easy to read, with clear headers and horizontal orientation. However, most tables did not aid the vertical comparison of numeric data. We suggest that authors could improve their tables by the right-flush alignment of numeric columns typeset with a tabular font, clearly identify statistical significance, and use clear titles and captions. Journal publishers could easily implement these formatting guidelines when typesetting manuscripts. Methods Once we had established the above principles of table design, we assessed their use in issues of 43 widely read ecology and evolution journals (SI 2). Between January and July 2022, we reviewed the tables in the most recent issue published by these journals. For journals without issues (such as Annual Review of Ecology, Evolution, and Systematics, or Biological Conservation), we examined the tables in issues published in a single month or in the entire most recent volume if few papers were published in that journal on a monthly basis. We reviewed only articles in a traditionally typeset format and published as a PDF or in print. We did not examine the tables in online versions of articles. Having identified all tables for review, we assessed whether these tables followed the above-described best practice principles for table design and, if not, we noted the way in which these tables failed to meet the outlined guidelines. We initially both reviewed the same 10 tables to ensure that we agreed in our assessment of whether these tables followed each of the principles. Having ensured agreement on how to classify tables, we proceeded to review all subsequent journals individually, while resolving any uncertainties collaboratively. These preliminary table evaluations also showed that assessing whether tables used long format or a tabular font was hard to evaluate objectively without knowing the data or the font used. Therefore, we did not systematically review the extent to which these two guidelines were adhered to.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global market for data lens (visualizations of data) is experiencing robust growth, driven by the increasing adoption of data analytics across diverse industries. This market, estimated at $50 billion in 2025, is projected to achieve a compound annual growth rate (CAGR) of 15% from 2025 to 2033. This expansion is fueled by several key factors. Firstly, the rising volume and complexity of data necessitate effective visualization tools for insightful analysis. Businesses are increasingly relying on interactive dashboards and data storytelling techniques to derive actionable intelligence from their data, fostering the demand for sophisticated data visualization solutions. Secondly, advancements in artificial intelligence (AI) and machine learning (ML) are enhancing the capabilities of data visualization platforms, enabling automated insights generation and predictive analytics. This creates new opportunities for vendors to offer more advanced and user-friendly tools. Finally, the growing adoption of cloud-based solutions is further accelerating market growth, offering enhanced scalability, accessibility, and cost-effectiveness. The market is segmented across various types, including points, lines, and bars, and applications, ranging from exploratory data analysis and interactive data visualization to descriptive statistics and advanced data science techniques. Major players like Tableau, Sisense, and Microsoft dominate the market, constantly innovating to meet evolving customer needs and competitive pressures. The geographical distribution of the market reveals strong growth across North America and Europe, driven by early adoption and technological advancements. However, emerging markets in Asia-Pacific and the Middle East & Africa are showing significant growth potential, fueled by increasing digitalization and investment in data analytics infrastructure. Restraints to growth include the high cost of implementation, the need for skilled professionals to effectively utilize these tools, and security concerns related to data privacy. Nonetheless, the overall market outlook remains positive, with continued expansion anticipated throughout the forecast period due to the fundamental importance of data visualization in informed decision-making across all sectors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.
In 2022, online surveys were the most used traditional qualitative methodologies in the market research industry worldwide. During the survey, 95 percent of respondents stated that they regularly used this method. Second in the list was data visualization/dashboards, where 90 percent of respondents gave this as their answer.
This dataset was created by Chandra Shekhar
Released under Other (specified in description)
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by ِAli Amr
Released under Apache 2.0
This dataset was created by PavanSai2545
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Objectives: Traditionally, summarization of research themes and trends within a given discipline was accomplished by manual review of scientific works in the field. However, with the ushering in of the age of "big data", new methods for discovery of such information become necessary as traditional techniques become increasingly difficult to apply due to the exponential growth of document repositories. Our objectives are to develop a pipeline for unsupervised theme extraction and summarization of thematic trends in document repositories, and to test it by applying it to a specific domain. Methods: To that end, we detail a pipeline, which utilizes machine learning and natural language processing for unsupervised theme extraction, and a novel method for summarization of thematic trends, and network mapping for visualization of thematic relations. We then apply this pipeline to a collection of anesthesiology abstracts. Results: We demonstrate how this pipeline enables discovery of major themes and temporal trends in anesthesiology research and facilitates document classification and corpus exploration. Discussion: The relation of prevalent topics and extracted trends to recent events in both anesthesiology, and healthcare in general, demonstrates the pipeline's utility. Furthermore, the agreement between the unsupervised thematic grouping and human-assigned classification validates the pipeline's accuracy and demonstrates another potential use. Conclusion: The described pipeline enables summarization and exploration of large document repositories, facilitates classification, aids in trend identification. A more robust and user-friendly interface will facilitate the expansion of this methodology to other domains. This will be the focus of future work for our group.
In 2007, the California Ocean Protection Council initiated the California Seafloor Mapping Program (CSMP), designed to create a comprehensive seafloor map of high-resolution bathymetry, marine benthic habitats, and geology within California’s State Waters. The program supports a large number of coastal-zone- and ocean-management issues, including the California Marine Life Protection Act (MLPA) (California Department of Fish and Wildlife, 2008), which requires information about the distribution of ecosystems as part of the design and proposal process for the establishment of Marine Protected Areas. A focus of CSMP is to map California’s State Waters with consistent methods at a consistent scale. The CSMP approach is to create highly detailed seafloor maps through collection, integration, interpretation, and visualization of swath sonar data (the undersea equivalent of satellite remote-sensing data in terrestrial mapping), acoustic backscatter, seafloor video, seafloor photography, high-resolution seismic-reflection profiles, and bottom-sediment sampling data. The map products display seafloor morphology and character, identify potential marine benthic habitats, and illustrate both the surficial seafloor geology and shallow (to about 100 m) subsurface geology. It is emphasized that the more interpretive habitat and geology data rely on the integration of multiple, new high-resolution datasets and that mapping at small scales would not be possible without such data. This approach and CSMP planning is based in part on recommendations of the Marine Mapping Planning Workshop (Kvitek and others, 2006), attended by coastal and marine managers and scientists from around the state. That workshop established geographic priorities for a coastal mapping project and identified the need for coverage of “lands” from the shore strand line (defined as Mean Higher High Water; MHHW) out to the 3-nautical-mile (5.6-km) limit of California’s State Waters. Unfortunately, surveying the zone from MHHW out to 10-m water depth is not consistently possible using ship-based surveying methods, owing to sea state (for example, waves, wind, or currents), kelp coverage, and shallow rock outcrops. Accordingly, some of the data presented in this series commonly do not cover the zone from the shore out to 10-m depth. This data is part of a series of online U.S. Geological Survey (USGS) publications, each of which includes several map sheets, some explanatory text, and a descriptive pamphlet. Each map sheet is published as a PDF file. Geographic information system (GIS) files that contain both ESRI ArcGIS raster grids (for example, bathymetry, seafloor character) and geotiffs (for example, shaded relief) are also included for each publication. For those who do not own the full suite of ESRI GIS and mapping software, the data can be read using ESRI ArcReader, a free viewer that is available at http://www.esri.com/software/arcgis/arcreader/index.html (last accessed September 20, 2013). The California Seafloor Mapping Program is a collaborative venture between numerous different federal and state agencies, academia, and the private sector. CSMP partners include the California Coastal Conservancy, the California Ocean Protection Council, the California Department of Fish and Wildlife, the California Geological Survey, California State University at Monterey Bay’s Seafloor Mapping Lab, Moss Landing Marine Laboratories Center for Habitat Studies, Fugro Pelagos, Pacific Gas and Electric Company, National Oceanic and Atmospheric Administration (NOAA, including National Ocean Service–Office of Coast Surveys, National Marine Sanctuaries, and National Marine Fisheries Service), U.S. Army Corps of Engineers, the Bureau of Ocean Energy Management, the National Park Service, and the U.S. Geological Survey. These web services for the Santa Barbara Channel map area includes data layers that are associated to GIS and map sheets available from the USGS CSMP web page at https://walrus.wr.usgs.gov/mapping/csmp/index.html. Each published CSMP map area includes a data catalog of geographic information system (GIS) files; map sheets that contain explanatory text; and an associated descriptive pamphlet. This web service represents the available data layers for this map area. Data was combined from different sonar surveys to generate a comprehensive high-resolution bathymetry and acoustic-backscatter coverage of the map area. These data reveal a range of physiographic including exposed bedrock outcrops, large fields of sand waves, as well as many human impacts on the seafloor. To validate geological and biological interpretations of the sonar data, the U.S. Geological Survey towed a camera sled over specific offshore locations, collecting both video and photographic imagery; these “ground-truth” surveying data are available from the CSMP Video and Photograph Portal at https://doi.org/10.5066/F7J1015K. The “seafloor character” data layer shows classifications of the seafloor on the basis of depth, slope, rugosity (ruggedness), and backscatter intensity and which is further informed by the ground-truth-survey imagery. The “potential habitats” polygons are delineated on the basis of substrate type, geomorphology, seafloor process, or other attributes that may provide a habitat for a specific species or assemblage of organisms. Representative seismic-reflection profile data from the map area is also include and provides information on the subsurface stratigraphy and structure of the map area. The distribution and thickness of young sediment (deposited over the past about 21,000 years, during the most recent sea-level rise) is interpreted on the basis of the seismic-reflection data. The geologic polygons merge onshore geologic mapping (compiled from existing maps by the California Geological Survey) and new offshore geologic mapping that is based on integration of high-resolution bathymetry and backscatter imagery seafloor-sediment and rock samplesdigital camera and video imagery, and high-resolution seismic-reflection profiles. The information provided by the map sheets, pamphlet, and data catalog has a broad range of applications. High-resolution bathymetry, acoustic backscatter, ground-truth-surveying imagery, and habitat mapping all contribute to habitat characterization and ecosystem-based management by providing essential data for delineation of marine protected areas and ecosystem restoration. Many of the maps provide high-resolution baselines that will be critical for monitoring environmental change associated with climate change, coastal development, or other forcings. High-resolution bathymetry is a critical component for modeling coastal flooding caused by storms and tsunamis, as well as inundation associated with longer term sea-level rise. Seismic-reflection and bathymetric data help characterize earthquake and tsunami sources, critical for natural-hazard assessments of coastal zones. Information on sediment distribution and thickness is essential to the understanding of local and regional sediment transport, as well as the development of regional sediment-management plans. In addition, siting of any new offshore infrastructure (for example, pipelines, cables, or renewable-energy facilities) will depend on high-resolution mapping. Finally, this mapping will both stimulate and enable new scientific research and also raise public awareness of, and education about, coastal environments and issues. Web services were created using an ArcGIS service definition file. The ArcGIS REST service and OGC WMS service include all Santa Barbara Channel map area data layers. Data layers are symbolized as shown on the associated map sheets.
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
The data and codes were prepared and uploaded to 4TU.ResearchData by Yuanchen Zeng to support the published results in his paper: Zeng Y, Shen C, Nunez A, Dollevoet R, Zhang W, Li Z. (2023). An interpretable method for operational modal analysis in time-frequency representation and its applications to railway sleepers. Structural Control and Health Monitoring, 2023: 6420772.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioral cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience, and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.