5 datasets found
  1. f

    Supplementary Figure SCA analysis using a manually curated...

    • figshare.com
    zip
    Updated Aug 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffaele Calogero (2020). Supplementary Figure SCA analysis using a manually curated cancer-immune-signature (SCA tutorial) [Dataset]. http://doi.org/10.6084/m9.figshare.12867029.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 26, 2020
    Dataset provided by
    figshare
    Authors
    Raffaele Calogero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used to assemble Figure SCA analysis using a manually curated cancer-immune-signature in SCAtutorial vignette (https://kendomaniac.github.io/SCAtutorial/articles/SCAvignette.html)

  2. Pokemon data mining 2020

    • kaggle.com
    Updated Jul 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AJ Pass (2020). Pokemon data mining 2020 [Dataset]. https://www.kaggle.com/ajpass/pokemon-data-mining-2020/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AJ Pass
    Description

    Context

    This dataset was obtained using a web scrapper made in this notebook as learning purposes for data mining and web scrapping:

    https://www.kaggle.com/ajpass/data-mining-web-scrapper-vol-1-pokedex

    Content

    Inside this dataset are the diferrent generations of pokemons with all their stats.

    Acknowledgements

    This dataset was come from the knowledge I learned following a tutorial a year ago and because I couldn't find it I made a version with what I remembered.

  3. f

    Augmenting geovisual analytics of social media data with heterogeneous...

    • plos.figshare.com
    • figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Savelyev; Alan M. MacEachren (2023). Augmenting geovisual analytics of social media data with heterogeneous information network mining—Cognitive plausibility assessment [Dataset]. http://doi.org/10.1371/journal.pone.0206906
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alexander Savelyev; Alan M. MacEachren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper investigates the feasibility, from a user perspective, of integrating a heterogeneous information network mining (HINM) technique into SensePlace3 (SP3), a web-based geovisual analytics environment. The core contribution of this paper is a user study that determines whether an analyst with minimal background can comprehend the network data modeling metaphors employed by the resulting system, whether they can employ said metaphors to explore spatial data, and whether they can interpret the results of such spatial analysis correctly. This study confirms that all of the above is, indeed, possible, and provides empirical evidence about the importance of a hands-on tutorial and a graphical approach to explaining data modeling metaphors in the successful adoption of advanced data mining techniques. Analysis of outcomes of data exploration by the study participants also demonstrates the kinds of insights that a visual interface to HINM can enable. A second contribution is a realistic case study that demonstrates that our HINM approach (made accessible through a visual interface that provides immediate visual feedback for user queries), produces a clear and a positive difference in the outcome of spatial analysis. Although this study does not aim to validate HINM as a data modeling approach (there is considerable evidence for this in existing literature), the results of the case study suggest that HINM holds promise in the (geo)visual analytics domain as well, particularly when integrated into geovisual analytics applications. A third contribution is a user study protocol that is based on and improves upon the current methodological state of the art. This protocol includes a hands-on tutorial and a set of realistic data analysis tasks. Detailed evaluation protocols are rare in geovisual analytics (and in visual analytics more broadly), with most studies reviewed in this paper failing to provide sufficient details for study replication or comparison work.

  4. f

    Visual Data Mining of Biological Networks: One Size Does Not Fit All

    • plos.figshare.com
    xml
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiara Pastrello; David Otasek; Kristen Fortney; Giuseppe Agapito; Mario Cannataro; Elize Shirdel; Igor Jurisica (2023). Visual Data Mining of Biological Networks: One Size Does Not Fit All [Dataset]. http://doi.org/10.1371/journal.pcbi.1002833
    Explore at:
    xmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Chiara Pastrello; David Otasek; Kristen Fortney; Giuseppe Agapito; Mario Cannataro; Elize Shirdel; Igor Jurisica
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High-throughput technologies produce massive amounts of data. However, individual methods yield data specific to the technique used and biological setup. The integration of such diverse data is necessary for the qualitative analysis of information relevant to hypotheses or discoveries. It is often useful to integrate these datasets using pathways and protein interaction networks to get a broader view of the experiment. The resulting network needs to be able to focus on either the large-scale picture or on the more detailed small-scale subsets, depending on the research question and goals. In this tutorial, we illustrate a workflow useful to integrate, analyze, and visualize data from different sources, and highlight important features of tools to support such analyses.

  5. Data from: Dataset for Vector space model and the usage patterns of...

    • figshare.com
    bin
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; Karlina Denistia; Simon Musgrave (2023). Dataset for Vector space model and the usage patterns of Indonesian denominal verbs [Dataset]. http://doi.org/10.6084/m9.figshare.8187155.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Gede Primahadi Wijaya Rajeg; Karlina Denistia; Simon Musgrave
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PrefaceThis is the data repository for the paper accepted for publication in NUSA's special issue on Linguistic studies using large annotated corpora (co-edited by Hiroki Nomoto and David Moeljadi).How to cite the datasetIf you use, adapt, and/or modify any of the dataset in this repository for your research or teaching purposes (except for the malindo_dbase, see below), please cite as:Rajeg, Gede Primahadi Wijaya; Denistia, Karlina; Musgrave, Simon (2019): Dataset for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. Fileset. https://doi.org/10.6084/m9.figshare.8187155.Alternatively, click on the dark pink Cite button to browse different citation style (default is DataCite).The malindo_dbase data in this repository is from Nomoto et al. (2018) (cf the GitHub repository). So please also cite their work if you use it for your research:Nomoto, Hiroki, Hannah Choi, David Moeljadi and Francis Bond. 2018. MALINDO Morph: Morphological dictionary and analyser for Malay/Indonesian. Kiyoaki Shirai (ed.) Proceedings of the LREC 2018 Workshop "The 13th Workshop on Asian Language Resources", 36-43.Tutorial on how to use the data together with the R Markdown Notebook for the analyses is available on GitHub and figshare:Rajeg, Gede Primahadi Wijaya; Denistia, Karlina; Musgrave, Simon (2019): R Markdown Notebook for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. Software. doi: https://doi.org/10.6084/m9.figshare.9970205Dataset description1. Leipzig_w2v_vector_full.bin is the vector space model used in the paper. We built it using wordVectors package (Schmidt & Li 2017) via the MonARCH High Performance Computing Cluster (We thank Philip Chan for his help with access to MonARCH).2. Files beginning with ngramexmpl_... are data for the n-grams (i.e. words sequence) of verbs discussed in the paper. The files are in tab-separated format.3. Files beginning with sentence_... are full sentences for the verbs discussed in the paper (in the plain text format and R dataset format [.rds]). Information of the corpus file and sentence number in which the verb is found are included.4. me_parsed_nountaggedbase (in three different file-formats) contains database of the me- words with noun-tagged root that MorphInd identified to occur in three morphological schemas we focus on (me-, me-/-kan, and me-/-i). The database has columns for the verbs' token frequency in the corpus, root forms, MorphInd parsing output, among others.5. wordcount_leipzig_allcorpus (in three different file-formats) contains information on the size of each corpus file used in the paper and from which the vector space model is built.6. wordlist_leipzig_ME_DI_TER_percorpus.tsv is a tab-separated frequency list of words prefixed with me-, di-, and ter- in all thirteen corpus files used. The wordlist is built by first tokenising each corpus file, lowercasing the tokens, and then extracting the words with the corresponding three prefixes using the following regular expressions: - For me-: ^(?i)(me)([a-z-]{3,})$- For di-: ^(?i)(di)([a-z-]{3,})$- For ter-: ^(?i)(ter)([a-z-]{3,})$7. malindo_dbase is the MALINDO Morphological Dictionary (see above).ReferencesSchmidt, Ben & Jian Li. 2017. wordVectors: Tools for creating and analyzing vector-space models of texts. R package. http://github.com/bmschmidt/wordVectors.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Raffaele Calogero (2020). Supplementary Figure SCA analysis using a manually curated cancer-immune-signature (SCA tutorial) [Dataset]. http://doi.org/10.6084/m9.figshare.12867029.v1

Supplementary Figure SCA analysis using a manually curated cancer-immune-signature (SCA tutorial)

Explore at:
zipAvailable download formats
Dataset updated
Aug 26, 2020
Dataset provided by
figshare
Authors
Raffaele Calogero
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset used to assemble Figure SCA analysis using a manually curated cancer-immune-signature in SCAtutorial vignette (https://kendomaniac.github.io/SCAtutorial/articles/SCAvignette.html)

Search
Clear search
Close search
Google apps
Main menu