100+ datasets found
  1. f

    Data from: Data_Sheet_1_An Active Data Representation of Videos for...

    • figshare.com
    pdf
    Updated Mar 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz (2020). Data_Sheet_1_An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation.PDF [Dataset]. http://doi.org/10.3389/fcomp.2020.00001.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    Frontiers
    Authors
    Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioral cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience, and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.

  2. f

    Data from: Two Dimensional Mass Mapping as a General Method of Data...

    • figshare.com
    • acs.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev (2023). Two Dimensional Mass Mapping as a General Method of Data Representation in Comprehensive Analysis of Complex Molecular Mixtures [Dataset]. http://doi.org/10.1021/ac802532j.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Konstantin A. Artemenko; Alexander R. Zubarev; Tatiana Yu Samgina; Albert T. Lebedev; Mikhail M. Savitski; Roman A. Zubarev
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A recent proteomics-grade (95%+ sequence reliability) high-throughput de novo sequencing method utilizes the benefits of high resolution, high mass accuracy, and the use of two complementary fragmentation techniques collision-activated dissociation (CAD) and electron capture dissociation (ECD). With this high-fidelity sequencing approach, hundreds of peptides can be sequenced de novo in a single LC−MS/MS experiment. The high productivity of the new analysis technique has revealed a new bottleneck which occurs in data representation. Here we suggest a new method of data analysis and visualization that presents a comprehensive picture of the peptide content including relative abundances and grouping into families. The 2D mass mapping consists of putting the molecular masses onto a two-dimensional bubble plot, with the relative monoisotopic mass defect and isotopic shift being the axes and with the bubble area proportional to the peptide abundance. Peptides belonging to the same family form a compact group on such a plot, so that the family identity can in many cases be determined from the molecular mass alone. The performance of the method is demonstrated on the high-throughput analysis of skin secretion from three frogs, Rana ridibunda, Rana arvalis, and Rana temporaria. Two dimensional mass maps simplify the task of global comparison between the species and make obvious the similarities and differences in the peptide contents that are obscure in traditional data presentation methods. Even biological activity of the peptide can sometimes be inferred from its position on the plot. Two dimensional mass mapping is a general method applicable to any complex mixture, peptide and nonpeptide alike.

  3. H

    Replication data for: Using Joint Scaling Methods to Study Ideology, and...

    • dataverse.harvard.edu
    • data.niaid.nih.gov
    Updated Mar 11, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Saiegh (2015). Replication data for: Using Joint Scaling Methods to Study Ideology, and Representation: Evidence from Latin America [Dataset]. http://doi.org/10.7910/DVN/29342
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2015
    Dataset provided by
    Harvard Dataverse
    Authors
    Sebastian Saiegh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Latin America
    Description

    In this paper, I use joint scaling methods and similar items from three large-scale surveys to place voters, parties and politicians from different Latin American countries on a common ideological space. The findings reveal that ideology is a significant determinant of vote choice in Latin America. They also suggest that the success of leftist leaders at the polls reflects the views of the voters sustaining their victories. The location of parties and leaders reveal that three distinctive clusters exist: one located at the left of the political spectrum, another at the center, and a third on the right. The results also indicate that legislators in Brazil, Mexico and Peru tend to be more "leftists" than their voters. The ideological drift, however, is not signicant enough to substantiate the view that a disconnect between voters and politicians lies behind the success of leftist presidents in these countries. These findings highlight the importance of using a common-space scale to compare disparate populations and call into question a number of recent studies by scholars of Latin American politics who fail to adequately address this important issue.

  4. m

    Austin_Survey_for_MDCOR_Analyses

    • data.mendeley.com
    Updated Nov 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Gonzalez Canche (2022). Austin_Survey_for_MDCOR_Analyses [Dataset]. http://doi.org/10.17632/nb7yvhjvzk.1
    Explore at:
    Dataset updated
    Nov 14, 2022
    Authors
    Manuel Gonzalez Canche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Austin
    Description

    The city of Austin has administered a community survey for the 2015, 2016, 2017, 2018 and 2019 years (https://data.austintexas.gov/City-Government/Community-Survey/s2py-ceb7), to “assess satisfaction with the delivery of the major City Services and to help determine priorities for the community as part of the City’s ongoing planning process.” To directly access this dataset from the city of Austin’s website, you can follow this link https://cutt.ly/VNqq5Kd. Although we downloaded the dataset analyzed in this study from the former link, given that the city of Austin is interested in continuing administering this survey, there is a chance that the data we used for this analysis and the data hosted in the city of Austin’s website may differ in the following years. Accordingly, to ensure the replication of our findings, we recommend researchers to download and analyze the dataset we employed in our analyses, which can be accessed at the following link https://github.com/democratizing-data-science/MDCOR/blob/main/Community_Survey.csv. Replication Features or Variables The community survey data has 10,684 rows and 251 columns. Of these columns, our analyses will rely on the following three indicators that are taken verbatim from the survey: “ID”, “Q25 - If there was one thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?", and “Do you own or rent your home?”

  5. n

    Data from: New Deep Learning Methods for Medical Image Analysis and...

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pengfei Gu (2024). New Deep Learning Methods for Medical Image Analysis and Scientific Data Generation and Compression [Dataset]. http://doi.org/10.7274/26156719.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Pengfei Gu
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Medical image analysis is critical to biological studies, health research, computer- aided diagnoses, and clinical applications. Recently, deep learning (DL) techniques have achieved remarkable successes in medical image analysis applications. However, these techniques typically require large amounts of annotations to achieve satisfactory performance. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for medical image analysis while reducing annotation efforts? To address this problem, we have outlined two specific aims: (A1) Utilize existing annotations effectively from advanced models; (A2) extract generic knowledge directly from unannotated images.

    To achieve the aim (A1): First, we introduce a new data representation called TopoImages, which encodes the local topology of all the image pixels. TopoImages can be complemented with the original images to improve medical image analysis tasks. Second, we propose a new augmentation method, SAMAug-C, that lever- ages the Segment Anything Model (SAM) to augment raw image input and enhance medical image classification. Third, we propose two advanced DL architectures, kCBAC-Net and ConvFormer, to enhance the performance of 2D and 3D medical image segmentation. We also present a gate-regularized network training (GrNT) approach to improve multi-scale fusion in medical image segmentation. To achieve the aim (A2), we propose a novel extension of known Masked Autoencoders (MAEs) for self pre-training, i.e., models pre-trained on the same target dataset, specifically for 3D medical image segmentation.

    Scientific visualization is a powerful approach for understanding and analyzing various physical or natural phenomena, such as climate change or chemical reactions. However, the cost of scientific simulations is high when factors like time, ensemble, and multivariate analyses are involved. Additionally, scientists can only afford to sparsely store the simulation outputs (e.g., scalar field data) or visual representations (e.g., streamlines) or visualization images due to limited I/O bandwidths and storage space. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for scientific data generation and compression while reducing simulation and storage costs?

    To tackle this problem: First, we propose a DL framework that generates un- steady vector fields data from a set of streamlines. Based on this method, domain scientists only need to store representative streamlines at simulation time and recon- struct vector fields during post-processing. Second, we design a novel DL method that translates scalar fields to vector fields. Using this approach, domain scientists only need to store scalar field data at simulation time and generate vector fields from their scalar field counterparts afterward. Third, we present a new DL approach that compresses a large collection of visualization images generated from time-varying data for communicating volume visualization results.

  6. d

    Data from: Construction of symmetric group representation matrices and...

    • elsevier.digitalcommonsdata.com
    Updated Jan 1, 1981
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M.F. Soto (1981). Construction of symmetric group representation matrices and states [Dataset]. http://doi.org/10.17632/svhvrypc4t.1
    Explore at:
    Dataset updated
    Jan 1, 1981
    Authors
    M.F. Soto
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Title of program: MATTAB Catalogue Id: AAMH_v1_0

    Nature of problem To check that the representation matrices produced by SYMRPMAT satisfy the group table produced by SYMGRPTB.

    Versions of this program held in the CPC repository in Mendeley Data aamh_v1_0; MATTAB; 10.1016/0010-4655(81)90132-6

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  7. d

    Data from: Tilted irreducible representations of the permutation group

    • elsevier.digitalcommonsdata.com
    Updated Jan 1, 1995
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    G. Bergdolt (1995). Tilted irreducible representations of the permutation group [Dataset]. http://doi.org/10.17632/rmb5p75p8n.1
    Explore at:
    Dataset updated
    Jan 1, 1995
    Authors
    G. Bergdolt
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Abstract A fast algorithm to compute irreducible integer representations of the symmetric group is described. The representation is called tilted because the identity is not represented by a unit matrix, but a matrix β satisfying a reduced characteristic equation of the form (β - I)^k= 0. A distinctive feature of the approach is that the non-zero matrix elements are restricted to ±1. A so called natural representation is obtained by multiplying each representation matrix by β^(-1). Alternatively t...

    Title of program: TMRP Catalogue Id: ADBC_v1_0

    Nature of problem Irreducible integer representations of the permutation group are computed.

    Versions of this program held in the CPC repository in Mendeley Data ADBC_v1_0; TMRP; 10.1016/0010-4655(95)00009-5

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  8. SALES REPORT

    • kaggle.com
    zip
    Updated Dec 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kirtida dalvi (2022). SALES REPORT [Dataset]. https://www.kaggle.com/datasets/kirtidadalvi/sales-report/discussion
    Explore at:
    zip(2512576 bytes)Available download formats
    Dataset updated
    Dec 31, 2022
    Authors
    kirtida dalvi
    Description

    Dataset

    This dataset was created by kirtida dalvi

    Contents

  9. DoS_TCP_3

    • kaggle.com
    zip
    Updated Oct 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Raza Siddique (2023). DoS_TCP_3 [Dataset]. https://www.kaggle.com/datasets/razasiddique/dos-tcp-3
    Explore at:
    zip(206723398 bytes)Available download formats
    Dataset updated
    Oct 1, 2023
    Authors
    M. Raza Siddique
    Description

    Dataset

    This dataset was created by M. Raza Siddique

    Contents

  10. Data from: Design of tables for the presentation and communication of data...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Aug 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miriam Remshard; Simon Queenborough (2024). Design of tables for the presentation and communication of data in ecological and evolutionary biology [Dataset]. http://doi.org/10.5061/dryad.jq2bvq8f3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 10, 2024
    Dataset provided by
    Yale University
    Authors
    Miriam Remshard; Simon Queenborough
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Tables and charts have long been seen as effective ways to convey data. Much attention has been focused on improving charts, following ideas of human perception and brain function. Tables can also be viewed as two-dimensional representations of data, yet it is only fairly recently that we have begun to apply principles of design that aid the communication of information between the author and reader. In this study, we collated guidelines for the design of data and statistical tables. These guidelines fall under three principles: aiding comparisons, reducing visual clutter, and increasing readability. We surveyed tables published in recent issues of 43 journals in the fields of ecology and evolutionary biology for their adherence to these three principles, as well as author guidelines on journal publisher websites. We found that most of the over 1,000 tables we sampled had no heavy grid lines and little visual clutter. They were also easy to read, with clear headers and horizontal orientation. However, most tables did not aid the vertical comparison of numeric data. We suggest that authors could improve their tables by the right-flush alignment of numeric columns typeset with a tabular font, clearly identify statistical significance, and use clear titles and captions. Journal publishers could easily implement these formatting guidelines when typesetting manuscripts. Methods Once we had established the above principles of table design, we assessed their use in issues of 43 widely read ecology and evolution journals (SI 2). Between January and July 2022, we reviewed the tables in the most recent issue published by these journals. For journals without issues (such as Annual Review of Ecology, Evolution, and Systematics, or Biological Conservation), we examined the tables in issues published in a single month or in the entire most recent volume if few papers were published in that journal on a monthly basis. We reviewed only articles in a traditionally typeset format and published as a PDF or in print. We did not examine the tables in online versions of articles. Having identified all tables for review, we assessed whether these tables followed the above-described best practice principles for table design and, if not, we noted the way in which these tables failed to meet the outlined guidelines. We initially both reviewed the same 10 tables to ensure that we agreed in our assessment of whether these tables followed each of the principles. Having ensured agreement on how to classify tables, we proceeded to review all subsequent journals individually, while resolving any uncertainties collaboratively. These preliminary table evaluations also showed that assessing whether tables used long format or a tabular font was hard to evaluate objectively without knowing the data or the font used. Therefore, we did not systematically review the extent to which these two guidelines were adhered to.

  11. Data Lens (Visualizations Of Data) Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Data Lens (Visualizations Of Data) Report [Dataset]. https://www.archivemarketresearch.com/reports/data-lens-visualizations-of-data-48718
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    AMA Research & Media
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global market for data lens (visualizations of data) is experiencing robust growth, driven by the increasing adoption of data analytics across diverse industries. This market, estimated at $50 billion in 2025, is projected to achieve a compound annual growth rate (CAGR) of 15% from 2025 to 2033. This expansion is fueled by several key factors. Firstly, the rising volume and complexity of data necessitate effective visualization tools for insightful analysis. Businesses are increasingly relying on interactive dashboards and data storytelling techniques to derive actionable intelligence from their data, fostering the demand for sophisticated data visualization solutions. Secondly, advancements in artificial intelligence (AI) and machine learning (ML) are enhancing the capabilities of data visualization platforms, enabling automated insights generation and predictive analytics. This creates new opportunities for vendors to offer more advanced and user-friendly tools. Finally, the growing adoption of cloud-based solutions is further accelerating market growth, offering enhanced scalability, accessibility, and cost-effectiveness. The market is segmented across various types, including points, lines, and bars, and applications, ranging from exploratory data analysis and interactive data visualization to descriptive statistics and advanced data science techniques. Major players like Tableau, Sisense, and Microsoft dominate the market, constantly innovating to meet evolving customer needs and competitive pressures. The geographical distribution of the market reveals strong growth across North America and Europe, driven by early adoption and technological advancements. However, emerging markets in Asia-Pacific and the Middle East & Africa are showing significant growth potential, fueled by increasing digitalization and investment in data analytics infrastructure. Restraints to growth include the high cost of implementation, the need for skilled professionals to effectively utilize these tools, and security concerns related to data privacy. Nonetheless, the overall market outlook remains positive, with continued expansion anticipated throughout the forecast period due to the fundamental importance of data visualization in informed decision-making across all sectors.

  12. i

    Dataset of article: Synthetic Datasets Generator for Testing Information...

    • ieee-dataport.org
    Updated Mar 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandro Mendonça (2020). Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools [Dataset]. http://doi.org/10.21227/5aeq-rr34
    Explore at:
    Dataset updated
    Mar 13, 2020
    Dataset provided by
    IEEE Dataport
    Authors
    Sandro Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.

  13. Most used qualitative methods used in the market research industry worldwide...

    • statista.com
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most used qualitative methods used in the market research industry worldwide 2022 [Dataset]. https://www.statista.com/statistics/875985/market-research-industry-use-of-traditional-qualitative-methods/
    Explore at:
    Dataset updated
    Apr 23, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 25, 2022 - Dec 16, 2022
    Area covered
    Worldwide
    Description

    In 2022, online surveys were the most used traditional qualitative methodologies in the market research industry worldwide. During the survey, 95 percent of respondents stated that they regularly used this method. Second in the list was data visualization/dashboards, where 90 percent of respondents gave this as their answer.

  14. Data from: Global Superstore

    • kaggle.com
    zip
    Updated Jul 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandra Shekhar (2020). Global Superstore [Dataset]. https://www.kaggle.com/datasets/shekpaul/global-superstore
    Explore at:
    zip(5985038 bytes)Available download formats
    Dataset updated
    Jul 16, 2020
    Authors
    Chandra Shekhar
    Description

    Dataset

    This dataset was created by Chandra Shekhar

    Released under Other (specified in description)

    Contents

  15. Used Cars

    • kaggle.com
    zip
    Updated Nov 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ِAli Amr (2023). Used Cars [Dataset]. https://www.kaggle.com/datasets/aliamrali/used-cars/data
    Explore at:
    zip(19104438 bytes)Available download formats
    Dataset updated
    Nov 6, 2023
    Authors
    ِAli Amr
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by ِAli Amr

    Released under Apache 2.0

    Contents

  16. Covid-19

    • kaggle.com
    zip
    Updated Oct 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PavanSai2545 (2022). Covid-19 [Dataset]. https://www.kaggle.com/datasets/pavansai2545/covid19/discussion
    Explore at:
    zip(1999073 bytes)Available download formats
    Dataset updated
    Oct 7, 2022
    Authors
    PavanSai2545
    Description

    Dataset

    This dataset was created by PavanSai2545

    Contents

  17. Data from: Enriching time series datasets using Nonparametric kernel...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Mohamad Ivan Fanany
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.

  18. Data from: Trends in anesthesiology research: a machine learning approach to...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Rusanov; Riccardo Miotto; Chunhua Weng; Alexander Rusanov; Riccardo Miotto; Chunhua Weng (2022). Data from: Trends in anesthesiology research: a machine learning approach to theme discovery and summarization [Dataset]. http://doi.org/10.5061/dryad.h86746g
    Explore at:
    Dataset updated
    May 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Rusanov; Riccardo Miotto; Chunhua Weng; Alexander Rusanov; Riccardo Miotto; Chunhua Weng
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Objectives: Traditionally, summarization of research themes and trends within a given discipline was accomplished by manual review of scientific works in the field. However, with the ushering in of the age of "big data", new methods for discovery of such information become necessary as traditional techniques become increasingly difficult to apply due to the exponential growth of document repositories. Our objectives are to develop a pipeline for unsupervised theme extraction and summarization of thematic trends in document repositories, and to test it by applying it to a specific domain. Methods: To that end, we detail a pipeline, which utilizes machine learning and natural language processing for unsupervised theme extraction, and a novel method for summarization of thematic trends, and network mapping for visualization of thematic relations. We then apply this pipeline to a collection of anesthesiology abstracts. Results: We demonstrate how this pipeline enables discovery of major themes and temporal trends in anesthesiology research and facilitates document classification and corpus exploration. Discussion: The relation of prevalent topics and extracted trends to recent events in both anesthesiology, and healthcare in general, demonstrates the pipeline's utility. Furthermore, the agreement between the unsupervised thematic grouping and human-assigned classification validates the pipeline's accuracy and demonstrates another potential use. Conclusion: The described pipeline enables summarization and exploration of large document repositories, facilitates classification, aids in trend identification. A more robust and user-friendly interface will facilitate the expansion of this methodology to other domains. This will be the focus of future work for our group.

  19. A

    Data from: California State Waters Map Series--Santa Barbara Channel Web...

    • data.amerigeoss.org
    • search.dataone.org
    • +1more
    xml
    Updated Aug 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2022). California State Waters Map Series--Santa Barbara Channel Web Services [Dataset]. https://data.amerigeoss.org/dataset/california-state-waters-map-series-santa-barbara-channel-web-services-b23aa
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Aug 23, 2022
    Dataset provided by
    United States
    Area covered
    Santa Barbara Channel
    Description

    In 2007, the California Ocean Protection Council initiated the California Seafloor Mapping Program (CSMP), designed to create a comprehensive seafloor map of high-resolution bathymetry, marine benthic habitats, and geology within California’s State Waters. The program supports a large number of coastal-zone- and ocean-management issues, including the California Marine Life Protection Act (MLPA) (California Department of Fish and Wildlife, 2008), which requires information about the distribution of ecosystems as part of the design and proposal process for the establishment of Marine Protected Areas. A focus of CSMP is to map California’s State Waters with consistent methods at a consistent scale. The CSMP approach is to create highly detailed seafloor maps through collection, integration, interpretation, and visualization of swath sonar data (the undersea equivalent of satellite remote-sensing data in terrestrial mapping), acoustic backscatter, seafloor video, seafloor photography, high-resolution seismic-reflection profiles, and bottom-sediment sampling data. The map products display seafloor morphology and character, identify potential marine benthic habitats, and illustrate both the surficial seafloor geology and shallow (to about 100 m) subsurface geology. It is emphasized that the more interpretive habitat and geology data rely on the integration of multiple, new high-resolution datasets and that mapping at small scales would not be possible without such data. This approach and CSMP planning is based in part on recommendations of the Marine Mapping Planning Workshop (Kvitek and others, 2006), attended by coastal and marine managers and scientists from around the state. That workshop established geographic priorities for a coastal mapping project and identified the need for coverage of “lands” from the shore strand line (defined as Mean Higher High Water; MHHW) out to the 3-nautical-mile (5.6-km) limit of California’s State Waters. Unfortunately, surveying the zone from MHHW out to 10-m water depth is not consistently possible using ship-based surveying methods, owing to sea state (for example, waves, wind, or currents), kelp coverage, and shallow rock outcrops. Accordingly, some of the data presented in this series commonly do not cover the zone from the shore out to 10-m depth. This data is part of a series of online U.S. Geological Survey (USGS) publications, each of which includes several map sheets, some explanatory text, and a descriptive pamphlet. Each map sheet is published as a PDF file. Geographic information system (GIS) files that contain both ESRI ArcGIS raster grids (for example, bathymetry, seafloor character) and geotiffs (for example, shaded relief) are also included for each publication. For those who do not own the full suite of ESRI GIS and mapping software, the data can be read using ESRI ArcReader, a free viewer that is available at http://www.esri.com/software/arcgis/arcreader/index.html (last accessed September 20, 2013). The California Seafloor Mapping Program is a collaborative venture between numerous different federal and state agencies, academia, and the private sector. CSMP partners include the California Coastal Conservancy, the California Ocean Protection Council, the California Department of Fish and Wildlife, the California Geological Survey, California State University at Monterey Bay’s Seafloor Mapping Lab, Moss Landing Marine Laboratories Center for Habitat Studies, Fugro Pelagos, Pacific Gas and Electric Company, National Oceanic and Atmospheric Administration (NOAA, including National Ocean Service–Office of Coast Surveys, National Marine Sanctuaries, and National Marine Fisheries Service), U.S. Army Corps of Engineers, the Bureau of Ocean Energy Management, the National Park Service, and the U.S. Geological Survey. These web services for the Santa Barbara Channel map area includes data layers that are associated to GIS and map sheets available from the USGS CSMP web page at https://walrus.wr.usgs.gov/mapping/csmp/index.html. Each published CSMP map area includes a data catalog of geographic information system (GIS) files; map sheets that contain explanatory text; and an associated descriptive pamphlet. This web service represents the available data layers for this map area. Data was combined from different sonar surveys to generate a comprehensive high-resolution bathymetry and acoustic-backscatter coverage of the map area. These data reveal a range of physiographic including exposed bedrock outcrops, large fields of sand waves, as well as many human impacts on the seafloor. To validate geological and biological interpretations of the sonar data, the U.S. Geological Survey towed a camera sled over specific offshore locations, collecting both video and photographic imagery; these “ground-truth” surveying data are available from the CSMP Video and Photograph Portal at https://doi.org/10.5066/F7J1015K. The “seafloor character” data layer shows classifications of the seafloor on the basis of depth, slope, rugosity (ruggedness), and backscatter intensity and which is further informed by the ground-truth-survey imagery. The “potential habitats” polygons are delineated on the basis of substrate type, geomorphology, seafloor process, or other attributes that may provide a habitat for a specific species or assemblage of organisms. Representative seismic-reflection profile data from the map area is also include and provides information on the subsurface stratigraphy and structure of the map area. The distribution and thickness of young sediment (deposited over the past about 21,000 years, during the most recent sea-level rise) is interpreted on the basis of the seismic-reflection data. The geologic polygons merge onshore geologic mapping (compiled from existing maps by the California Geological Survey) and new offshore geologic mapping that is based on integration of high-resolution bathymetry and backscatter imagery seafloor-sediment and rock samplesdigital camera and video imagery, and high-resolution seismic-reflection profiles. The information provided by the map sheets, pamphlet, and data catalog has a broad range of applications. High-resolution bathymetry, acoustic backscatter, ground-truth-surveying imagery, and habitat mapping all contribute to habitat characterization and ecosystem-based management by providing essential data for delineation of marine protected areas and ecosystem restoration. Many of the maps provide high-resolution baselines that will be critical for monitoring environmental change associated with climate change, coastal development, or other forcings. High-resolution bathymetry is a critical component for modeling coastal flooding caused by storms and tsunamis, as well as inundation associated with longer term sea-level rise. Seismic-reflection and bathymetric data help characterize earthquake and tsunami sources, critical for natural-hazard assessments of coastal zones. Information on sediment distribution and thickness is essential to the understanding of local and regional sediment transport, as well as the development of regional sediment-management plans. In addition, siting of any new offshore infrastructure (for example, pipelines, cables, or renewable-energy facilities) will depend on high-resolution mapping. Finally, this mapping will both stimulate and enable new scientific research and also raise public awareness of, and education about, coastal environments and issues. Web services were created using an ArcGIS service definition file. The ArcGIS REST service and OGC WMS service include all Santa Barbara Channel map area data layers. Data layers are symbolized as shown on the associated map sheets.

  20. n

    Research data supporting paper 'An interpretable method for operational...

    • 4tu.edu.hpc.n-helix.com
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanchen Zeng; Zili Li; Alfredo Núñez (2023). Research data supporting paper 'An interpretable method for operational modal analysis in time-frequency representation and its applications to railway sleepers' [Dataset]. http://doi.org/10.4121/9c932a49-95ae-401f-91b3-b2a5bfc65929.v1
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Yuanchen Zeng; Zili Li; Alfredo Núñez
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Dataset funded by
    ProRailhttp://www.prorail.nl/
    Shift2Rail Joint Undertaking
    EU Horizon 2020 Program for Research and Innovation
    Description

    The data and codes were prepared and uploaded to 4TU.ResearchData by Yuanchen Zeng to support the published results in his paper: Zeng Y, Shen C, Nunez A, Dollevoet R, Zhang W, Li Z. (2023). An interpretable method for operational modal analysis in time-frequency representation and its applications to railway sleepers. Structural Control and Health Monitoring, 2023: 6420772.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz (2020). Data_Sheet_1_An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation.PDF [Dataset]. http://doi.org/10.3389/fcomp.2020.00001.s001

Data from: Data_Sheet_1_An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Frontiers
Authors
Fasih Haider; Maria Koutsombogera; Owen Conlan; Carl Vogel; Nick Campbell; Saturnino Luz
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioral cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience, and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.

Search
Clear search
Close search
Google apps
Main menu