71 datasets found
  1. One Hundred Cities

    • kaggle.com
    Updated Apr 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiticariu Cristian (2021). One Hundred Cities [Dataset]. https://www.kaggle.com/datasets/chiticariucristian/one-hundred-cities/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chiticariu Cristian
    Description

    Context

    100 cities

    Content

    The dataset consists of one hundred cities around the world, short description for each one, and their population.

    Acknowledgements

    The data was extracted from https://www.bestcities.org/rankings/worlds-best-cities/

  2. T

    cifar100

    • tensorflow.org
    • universe.roboflow.com
    • +3more
    Updated Jun 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). cifar100 [Dataset]. https://www.tensorflow.org/datasets/catalog/cifar100
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('cifar100', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/cifar100-3.0.2.png" alt="Visualization" width="500px">

  3. h

    SOS-Training-Data-Visualization

    • huggingface.co
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weikai Huang (2025). SOS-Training-Data-Visualization [Dataset]. https://huggingface.co/datasets/weikaih/SOS-Training-Data-Visualization
    Explore at:
    Dataset updated
    May 23, 2025
    Authors
    Weikai Huang
    Description

    weikaih/SOS-Training-Data-Visualization dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. Z

    Demonstrations of witness visualization using the Witness Visualizer Tool

    • data.niaid.nih.gov
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordan, Vitalii (2024). Demonstrations of witness visualization using the Witness Visualizer Tool [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10817852
    Explore at:
    Dataset updated
    Aug 16, 2024
    Dataset authored and provided by
    Mordan, Vitalii
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have three datasets that display visualized SVCOMP witnesses generated with the help of the Witness Visualizer tool. Each dataset comprises two directories: witnesses, which contains the original witnesses provided by SVCOMP tools, and visualization, which contains our visual representations of the respective witnesses in HTML format. The visualization file name contains the prefix error_trace-, for example, error_trace-witness.2ls.html corresponds to a witness named witness.2ls.graphml.

    1. Overall thoroughness for all SVCOMP tools (dataset_1.zip)

    This dataset includes a single random witness for each SVCOMP tool, accompanied by its corresponding visualization. The visualizations showcase the various witness elements such as function calls, conditions, assumptions, thread specifics, and other operations. Cells marked with +/- indicate that some elements were present in the error trace, but not all of them. All witnesses are presented in the table below:

    Witness SV-COMP Tool Function calls Threads Assumptions Conditions Link to sources

    witness.2ls.graphml

    2LS

    -

    +

    +

    witness.aprove.graphml

    AProVE (2022)

    -

    -

    + +

    witness.brick.graphml

    BRICK

    -

    +

    +

    witness.bubaak.graphml

    Bubaak

    -

    +

    +

    witness.cbmc.graphml

    CBMC

    +

    +

    +

    witness.cpa-bam-bnb.graphml CPA-BAM-BnB

    +

    + + +

    witness.cpa-bam-smg.graphml CPA-BAM-SMG

    +

    + + +

    witness.cpalockator.graphml CPALockator + + + + +

    witness.cpachecker.graphml CPAChecker + + + + +

    witness.crux.graphml

    Crux

    -

    +

    +

    witness.cseq.graphml Cseq + +

    +

    +

    witness.dartagnan.graphml

    Dartagnan

    +

    -

    +

    witness.deagle.graphml

    Deagle

    +

    +

    +

    -

    DIVINE (until 2022) empty

    -

    EBF empty

    witness.esbmc-incr.graphml

    ESBMC-incr

    +

    +

    +

    witness.esbmc-kind.graphml

    ESBMC-kind

    +

    +

    +

    -

    Frama-C-SV empty

    witness.gazer-theta.graphml Gazer-Theta

    +

    +

    wrong path

    witness.gdart.graphml

    Gdart-LLVM

    -

    +

    +

    -

    Goblint empty

    witness.graves_cpa.graphml Graves-CPA + + + + +

    witness.graves_par.graphml Graves-Par + + + + +

    -

    Infer empty

    witness.korn.graphml

    Korn

    -

    +

    +

    witness.lart.graphml

    LART (2022)

    -

    +

    +

    witness.lazy-cseq.graphml Lazy-CSeq + + + + +

    witness.lfchecker.graphml

    LF-checker

    +

    +

    +

    -

    Locksmith empty

    -

    Mopsa empty

    witness.pesco_cpa.graphml PeSCo-CPA + + + + +

    witness.pichecker.graphml PIChecker

    +

    + + +

    witness.pinaka.graphml

    Pinaka

    -

    +

    +

    witness.predator.graphml

    PredatorHP

    -

    -

    -

    +

    -

    SESL (2022) empty

    witness.smack.graphml

    SMACK (until 2022)

    -

    +

    +

    witness.symbiotic.graphml

    Symbiotic

    +

    +

    +

    witness.theta.graphml Theta different format

    witness.uatomozer.graphml UAutomizer +/- + + + +

    witness.ucutter.graphml UgemCutter +/- + + + +

    witness.ukojak.graphml UKojak

    +/-

    + + +

    witness.utaipan.graphml UTaipan +/- + + + +

    witness.veriabs.graphml

    VeriAbs

    -

    +

    wrong path

    witness.veriabsl.graphml VeriAbsL

    +

    + + wrong path

    witness.verifuzz.graphml

    VeriFuzz

    -

    +

    +

    witness.verioover.graphml

    VeriOover

    -

    +

    +

    1. Thoroughness by property (dataset_2.zip)

    This dataset comprises a selected witness for each SVCOMP property (ReachSafety, MemSafety, Termination, NoOverflow, ConcurrencySafety). The witnesses are presented in the following table:

    Witness SV-COMP Tool Property Mandatory elements Description

    witness.smg_memory.graphml CPA-BAM-SMG MemSafety Assumptions / conditions, function calls There is a double free operation. Employing function calls append aids in comprehending the structure of the list, while assumptions reveal which branch was chosen.

    witness.graves_overflow.graphml Graves-CPA NoOverflow Assumptions / conditions The witness showcases an explicit (-2147483648, which represents the minimal value for the int type), which has the potential to cause overflow in specific program.

    witness.cpachecker_termination.graphml CPAChecker NoTermination Assumptions / conditions There is a condition leading to an infinite loop.

    witness.cpachecker_unreach.graphml CPAChecker ReachSafety Function calls The error trace indicates a potential scenario where a mutex was unlockedwithout the corresponding mutex_unlock operation.

    witness.cpachecker_conc.graphml CPAChecker ConcurrencySafety Function calls, thread operations The error trace illustrates the creation of threads and highlights the assignments made within each thread that ultimately resulted in the violation of the property.

    1. Known bug (dataset_3.zip)

    This dataset contains witnesses for a known bug from SVCOMP (linux-3.14--drivers--usb--misc--adutux.ko.cil.i) involving a data race on dev->udev, where concurrent writes occur without corresponding locks. Only two tools were able to solve the corresponding verification task: ESBMC-kind and CPALockator. The ESBMC error trace (witness.esbmc_2020.graphml) includes only thread specifics and assumptions, while the CPALockator witness (witness.lockator.graphml) comprises all witness elements and is presented in a human-readable format.

    1. Comparison with the validation rate

    This section presents a comparison between witness thoroughness and the actual validation rate for each property. We considered all tools that participated in the respective category and generated at least 10 error traces, then calculated the validation rate. This comparison demonstrates how effectively thoroughness can approximate the validation rate. The following tables provide details for each property, with the relevant elements used to calculate thoroughness highlighted:

    MemSafety property:

    SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate

    Bubaak 0 0 1 0 33.33 64 67.19

    CBMC 0 1 1 0 33.33 27 11.11

    CPA-BAM-SMG 1 0 1 1 100 46 78.26

    CPAChecker 1 1 1 1 100 37 67.57

    ESBMC-kind 0 1 1 0 33.33 25 20

    Graves-CPA 1 1 1 1 100 44 56.82

    Graves-Par 1 1 1 1 100 18 77.78

    PeSCo-CPA 1 1 1 1 100 37 67.57

    NoOverflow property:

    SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate

    2LS 0 0 1 0 100 2071 95.7

    Bubaak 0 0 1 0 100 2233 94.67

    CBMC 0 1 1 0 100 3296 62.14

    CPAChecker 1 1 1 1 100 196 100

    Crux 0 0 1 0 100 222 95.05

    ESBMC-kind 0 1 1 0 100 3296 66.69

    Frama-C-SV 0 0 0 0 0 676 0

    Graves-Par 1 1 1 1 100 750 2

    Infer 0 0 0 0 0 583 0

    Pinaka 0 0 1 0 100 2232 100

    Symbiotic 0 1 1 0 100 1418 100

    UAutomizer 0.5 1 1 1 100 2222 100

    UKojak 0.5 0 1 1 100 168 100

    UTaipan 0.5 1 1 1 100 0 100

    VeriFuzz 0 0 1 0 100 185 90.81

    NoTermination property:

    SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate

    2LS 0 0 1 0 50 663 69.08

    Bubaak 0 0 1 0 50 578 34.78

    CPAChecker 1 1 1 1 100 501 97.01

    Symbiotic 0 1 1 0 50 591 52.96

    UAutomizer 0.5 1 1 1 100 512 98.24

    VeriFuzz 0 0 1 0 50 492 71.34

    ReachSafety property:

    SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate

    Bubaak 0 0 1 0 0 24 54.17

    CBMC 0 1 1 0 0 392 1.28

    CPA-BAM-BnB 1 0 1 1 100 69 85.51

    CPA-BAM-SMG 1 0 1 1 100 67 85.07

    CPAChecker 1 1 1 1 100 45 88.89

    Crux 0 0 1 0 0 1572 0.13

    ESBMC-kind 0 1 1 0 0 64 21.88

    Graves-CPA 1 1 1 1 100 66 87.88

    Graves-Par 1 1 1 1 100 24 58.33

    PeSCo-CPA 1 1 1 1 100 63 85.71

    ConcurrencySafety property:

    SV-COMP Tool Function calls Threads Assumptions Conditions Thoroughness Error traces Validation rate

    CBMC 0 1 1 0 50 277 87

    CPA-Lockator 1 1 1 1 100 83 26.51

    CPAChecker 1 1 1 1 100 257 100

    Cseq 1 1 1 0 100 277 94.58

    Dartagnan 0 1 0 0 50 281 92.17

    Deagle 0 1 1 0 50 280 96.07

    DIVINE 0 0 0 0 0 230 80.87

    EBF 0 0 0 0 0 282 89.01

    ESBMC-incr 0 1 1 0 50 68 79.41

    ESBMC-kind 0 1 1 0 50 263 89.73

    Graves-CPA 1 1 1 1 100 261 99.23

    Graves-Par 1 1 1 1 100 28 100

    Infer 0 0 0 0 0 634 0

    Lazy-CSeq 1 1 1 1 100 274 94.89

    LF-checker 0 1 1 0 50 286 85.31

    PeSCo-CPA 1 1 1 1 100 256 100

    PIChecker 1 0 1 1 50 269 98.14

    Symbiotic 0 1 1 0 50 110 92.73

    UAutomizer 0.5 1 1 1 75 297 94.95

    UgemCutter 0.5 1 1 1 75 283 96.47

    UTaipan 0.5 1 1 1 75 293 96.25

    1. Overall distance for all possible combinations of elements for thoroughness

    This section presents the overall difference (i.e., the sum of differences between witness thoroughness and validation rates for each tool) when thoroughness is calculated based on all possible combinations of witness elements (assumptions, conditions, thread specifics, and function calls). The set of witnesses is the same as in the previous section. The following tables provide details for each property, with the minimum difference highlighted:

    MemSafety property:

    Combination Overall difference

    Function calls 250.3

    Thread specifics 444.6

    Assumptions 353.7

    Conditions 250.3

    Function calls, Thread specifics 294.6

    Assumptions, Function calls 238.08

    Conditions, Function calls 250.3

    Assumptions, Thread specifics 344.6

    Conditions, Thread specifics 294.6

    Assumptions, Conditions 238.08

    Assumptions, Function calls, Thread specifics 277.94

    Conditions, Function calls, Thread specifics 244.59

    Assumptions, Conditions, Function calls 221.41

    Assumptions, Conditions, Thread specifics 277.94

    Assumptions, Conditions, Function calls, Thread specifics 244.6

    NoOverflow property:

    Combination Overall difference

    Function calls 953.06

    Thread specifics 745.4

    Assumptions 192.94

    Conditions 803.06

    Function calls, Thread specifics 778.06

    Assumptions, Function calls 478.06

    Conditions, Function calls 878.06

    Assumptions, Thread specifics 445.4

    Conditions, Thread specifics 703.06

    Assumptions, Conditions 403.06

    Assumptions, Function calls, Thread specifics 528.8

    Conditions, Function calls, Thread specifics 786.41

    Assumptions, Conditions, Function calls 586.43

    Assumptions, Conditions, Thread specifics 478.79

    Assumptions, Conditions, Function calls, Thread specifics 590.56

    NoTermination property:

    Combination Overall

  5. Top 100 popular movies from 2003 to 2022 (iMDB)

    • kaggle.com
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Scutelnicu (2023). Top 100 popular movies from 2003 to 2022 (iMDB) [Dataset]. https://www.kaggle.com/datasets/georgescutelnicu/top-100-popular-movies-from-2003-to-2022-imdb/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    Kaggle
    Authors
    George Scutelnicu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset contains most 100 popular movies for each year in the interval 2003-2022. The Data is Ideal for Exploratory Data Analysis. Every single information has been collected by web scraping and can be found on iMDB.

    The dataset contains: - Title - Rating - Year - Month - Certificate - Runtime - Director/s - Stars - Genre/s - Filming Location - Budget - Income - Country of Origin

  6. f

    Parameters of the studied CNN models.

    • plos.figshare.com
    xls
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiasheng Yang; Guanfang Wang; Xu Xiao; Meihua Bao; Geng Tian (2024). Parameters of the studied CNN models. [Dataset]. http://doi.org/10.1371/journal.pone.0296175.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jiasheng Yang; Guanfang Wang; Xu Xiao; Meihua Bao; Geng Tian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The accuracy and interpretability of artificial intelligence (AI) are crucial for the advancement of optical coherence tomography (OCT) image detection, as it can greatly reduce the manual labor required by clinicians. By prioritizing these aspects during development and application, we can make significant progress towards streamlining the clinical workflow. In this paper, we propose an explainable ensemble approach that utilizes transfer learning to detect fundus lesion diseases through OCT imaging. Our study utilized a publicly available OCT dataset consisting of normal subjects, patients with dry age-related macular degeneration (AMD), and patients with diabetic macular edema (DME), each with 15 samples. The impact of pre-trained weights on the performance of individual networks was first compared, and then these networks were ensemble using majority soft polling. Finally, the features learned by the networks were visualized using Grad-CAM and CAM. The use of pre-trained ImageNet weights improved the performance from 68.17% to 92.89%. The ensemble model consisting of the three CNN models with pre-trained parameters loaded performed best, correctly distinguishing between AMD patients, DME patients and normal subjects 100% of the time. Visualization results showed that Grad-CAM could display the lesion area more accurately. It is demonstrated that the proposed approach could have good performance of both accuracy and interpretability in retinal OCT image detection.

  7. d

    Data Visualization in Social Work Research

    • search.dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rothwell, David; Esposito, Tonino; Wegner-Lohin (2023). Data Visualization in Social Work Research [Dataset]. http://doi.org/10.7910/DVN/I6IIXL
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Rothwell, David; Esposito, Tonino; Wegner-Lohin
    Time period covered
    Jan 1, 2009 - Jan 1, 2012
    Description

    Research dissemination and knowledge translation are imperative in social work. Methodological developments in data visualization techniques have improved the ability to convey meaning and reduce erroneous conclusions. The purpose of this project is to examine: (1) How are empirical results presented visually in social work research?; (2) To what extent do top social work journals vary in the publication of data visualization techniques?; (3) What is the predominant type of analysis presented in tables and graphs?; (4) How can current data visualization methods be improved to increase understanding of social work research? Method: A database was built from a systematic literature review of the four most recent issues of Social Work Research and 6 other highly ranked journals in social work based on the 2009 5-year impact factor (Thomson Reuters ISI Web of Knowledge). Overall, 294 articles were reviewed. Articles without any form of data visualization were not included in the final database. The number of articles reviewed by journal includes : Child Abuse & Neglect (38), Child Maltreatment (30), American Journal of Community Psychology (31), Family Relations (36), Social Work (29), Children and Youth Services Review (112), and Social Work Research (18). Articles with any type of data visualization (table, graph, other) were included in the database and coded sequentially by two reviewers based on the type of visualization method and type of analyses presented (descriptive, bivariate, measurement, estimate, predicted value, other). Additional revi ew was required from the entire research team for 68 articles. Codes were discussed until 100% agreement was reached. The final database includes 824 data visualization entries.

  8. T

    cifar100_n

    • tensorflow.org
    Updated Aug 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). cifar100_n [Dataset]. https://www.tensorflow.org/datasets/catalog/cifar100_n
    Explore at:
    Dataset updated
    Aug 11, 2023
    Description

    A re-labeled version of CIFAR-100 with real human annotation errors. For every pair (image, label) in the original CIFAR-100 train set, it provides an additional label given by a real human annotator.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('cifar100_n', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/cifar100_n-1.0.1.png" alt="Visualization" width="500px">

  9. Founders in e-commerce and fintech dominated the RoW100 2022 list - Chart

    • restofworld.org
    Updated May 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rest of World (2022). Founders in e-commerce and fintech dominated the RoW100 2022 list - Chart [Dataset]. https://restofworld.org/charts/2022/Zb1SZ-founders-ecommerce-fintech-dominated-row100-2022
    Explore at:
    Dataset updated
    May 11, 2022
    Dataset authored and provided by
    Rest of World
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A data visualization representing Founders in e-commerce and fintech dominated the RoW100 2022 list

  10. f

    Performance of three CNNs models and ensemble model with pretraining to each...

    • plos.figshare.com
    xls
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiasheng Yang; Guanfang Wang; Xu Xiao; Meihua Bao; Geng Tian (2024). Performance of three CNNs models and ensemble model with pretraining to each class (Mean±SD). [Dataset]. http://doi.org/10.1371/journal.pone.0296175.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jiasheng Yang; Guanfang Wang; Xu Xiao; Meihua Bao; Geng Tian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of three CNNs models and ensemble model with pretraining to each class (Mean±SD).

  11. r

    Percentage of population with mobile internet subscriptions - Chart

    • restofworld.org
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rest of World (2024). Percentage of population with mobile internet subscriptions - Chart [Dataset]. https://restofworld.org/charts/2024/scTzv-percentage-population-mobile-internet-subscriptions
    Explore at:
    Dataset updated
    Oct 31, 2024
    Dataset authored and provided by
    Rest of World
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nearly 100% of Singapore’s population has a mobile internet subscription, while in Bangladesh, Nigeria, and Pakistan it is below 50%.

  12. Materials for 2d representation of the HathiTrust Library

    • zenodo.org
    application/gzip, bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin M Schmidt; Benjamin M Schmidt (2020). Materials for 2d representation of the HathiTrust Library [Dataset]. http://doi.org/10.5281/zenodo.1477018
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin M Schmidt; Benjamin M Schmidt
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Materials to create the LargeVis visualization online at http://creatingdata.us/datasets/hathi-features/, and described in Benjamin Schmidt, "Stable random projection: lightweight, general-purpose dimensionality reduction for digitized libraries," Journal of Cultural Analytics. October 3, 2018.

    Two items. First, `hathi_pca.bin`: a binary file with 100-dimensional representations of the complete Hathi Trust Extended Features set. These began as 1280-dimensional SRP features, and were reduced to 100 dimensions using a PCA transformation matrix derived using a random sample of the full 13 million book set. Vectors were reduced to unit length before PCA, but not afterwords; this means that in general, their length gives some sense of much information was lost in the PCA representation. This can be read using the code at https://github.com/bmschmidt/pySRP, or anything that reads word2vec formatted vectors. Includes HathiTrust identifiers.

    Second, `hathi.tsv.gz`: a row oriented set containing a variety of metadata fields for each set, including (as 'x' and 'y') the coordinates of a 2-d LargeVis visualization. This is the immediate input to the visualization at ttp://creatingdata.us/datasets/hathi-features/. Columns should be relatively straightforward; they are derived from the HathiTrust MARC records, which can be accessed through Hathi's public API. Classification codes ('lc1') are using the Library of Congress classification; they represent the subclass (generally two characters, though it can be one or three). The first character alone represents the LC class and can be useful for coloring high-level overviews.

    These two files can be merged through the Hathi Trust identifier present in both.

  13. T

    coil100

    • tensorflow.org
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). coil100 [Dataset]. https://www.tensorflow.org/datasets/catalog/coil100
    Explore at:
    Dataset updated
    Nov 23, 2022
    Description

    The dataset contains 7200 color images of 100 objects (72 images per object). The objects have a wide variety of complex geometric and reflectance characteristics. The objects were placed on a motorized turntable against a black background. The turntable was rotated through 360 degrees to vary object pose with respect to a fxed color camera. Images of the objects were taken at pose intervals of 5 degrees.This corresponds to 72 poses per object

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('coil100', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/coil100-2.0.0.png" alt="Visualization" width="500px">

  14. Student Grades

    • kaggle.com
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    clemence travers (2025). Student Grades [Dataset]. https://www.kaggle.com/datasets/clemencetravers/student-grades/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    clemence travers
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This Dataset comes from Mahmoud Elhemaly.

    I've modified this dataset since it had no correlation between the variables. I've used it for data visualization on Tableau. Many columns contains NON ACCURATE DATA.

    Description

    Student Performance & Behavior Dataset This dataset is real data of 5,000 records collected from a private learning provider. The dataset includes key attributes necessary for exploring patterns, correlations, and insights related to academic performance.

    Columns:

    Student_ID: Unique identifier for each student.

    First_Name: Student’s first name.

    Last_Name: Student’s last name.

    Email: Contact email (can be anonymized).

    Gender: Male, Female, Other.

    Age: The age of the student.

    Department: Student's department (e.g., CS, Engineering, Business).

    Attendance (%): Attendance percentage (0-100%).

    Participation_Score: Score based on class participation (0-10).

    Projects_Score: Project evaluation score (out of 100).

    Total_Score: Weighted sum of all grades.

    Grade: Letter grade (A, B, C, D, F).

    Study_Hours_per_Week: Average study hours per week.

    Extracurricular_Activities: Whether the student participates in extracurriculars (Yes/No).

    Internet_Access_at_Home: Does the student have access to the internet at home? (Yes/No).

    Parent_Education_Level: Highest education level of parents (None, High School, Bachelor's, Master's, PhD).

    Family_Income_Level: Low, Medium, High.

    Stress_Level (1-10): Self-reported stress level (1: Low, 10: High).

    Sleep_Hours_per_Night: Average hours of sleep per night.

    Sleep_Hours_per_Night_Entier: with integrer only

    Country: Country of origin

    Dataset contains:

    Missing values (nulls): in some records (e.g., Attendance, Assignments, or Parent Education Level).

    Bias in some Datae (ex: grading e.g., students with high attendance get slightly better grades).

    Imbalanced distributions (e.g., some departments having more students).

  15. H

    Teasing Out the True Milky Way

    • dataverse.harvard.edu
    Updated Jan 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alyssa Goodman (2020). Teasing Out the True Milky Way [Dataset]. http://doi.org/10.7910/DVN/UPJJBV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Alyssa Goodman
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Presentation Date: Tuesday, September 10, 2019. Location: Institute for Advanced Study, Princeton, NJ. Abstract: It has been nearly 100 years since the "Great Debate," where Heber Curtis correctly argued that Thomas Wright's 1750 ideas about our Milky Way being one of many "galaxies," each a flattish disk of a multitude of stars, was correct. Since then, astronomers have made sharper and sharper images of galaxies beyond our own, often revealing intricate sprial structure. But, for the mostpart, our potentially super-close-up view of our own Galaxy's structure has been ruined by our unfortunate vantage point within its disk. Work over the past century indicates that the Milky Way is a barred spiral, but even the Galaxy's number of arms is still at-issue. In this talk, I will discuss how four techniques are being combined to tease out the true structure of the Milky Way. In particular, our collaboration* is combining 3D-dust mapping, searches for extraordinarily long galactic filaments called "Bones," position-position-velocity observations of gas, and numerical simulations to create a new, and sometimes very surprising, view of our Galaxy. Unexpected results to be presented include: several-hundred-pc long, ~1-pc wide, gaseous "Bones" lying in, and likely defining, the gravitational mid-plane of the Milky Way; a 2.5 kpc-long damped sine wave with 200-pc amplitude that seems to be the Local Arm (and the undoing of "Gould's Belt"); and simulations that suggest the need for feedback and/or magnetic fields, and/or stranger physics (dark matter in the disk?) in order to explain the Bones and/or the Local Arm's Wave.

  16. Unveiling Insights from 100K Bike Sales

    • kaggle.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hari Goshika (2024). Unveiling Insights from 100K Bike Sales [Dataset]. https://www.kaggle.com/datasets/harigoshika/unveiling-insights-from-100k-bike-sales/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hari Goshika
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    I'm excited to share my latest project—an interactive Power BI dashboard that provides a comprehensive analysis of bike sales data from 2019 to 2024!

    Key Highlights of the Dashboard:

    📈 Sales Trend Analysis: Understand how bike sales have fluctuated over the years, with peaks in specific months that give us clues about seasonal demand. 🏢 Sales by Store Location: See how different cities like New York and Phoenix lead in terms of total sales revenue. 🚴‍♀️ Customer Demographics: Almost equal contributions from male and female customers—showing the broad appeal of our products. 💳 Payment Method Preferences: Breakdown of the most used payment methods, with insights that can help improve our customer experience. 📊 Revenue by Bike Model: A detailed look at which bike models drive the most revenue, helping guide product focus and inventory management. This dashboard was built to provide actionable insights into the sales performance and customer behavior of a large dataset of 100K records. It highlights the power of data visualization in turning numbers into strategic insights!

    Why Power BI? Power BI's flexibility and interactive capabilities made it the ideal tool for visualizing the data, allowing users to drill down into specific details using slicers for bike models and time periods. 💡

    Would love to hear your thoughts or any feedback on this project! If you’re interested in how this dashboard was built or want to discuss data visualization, feel free to reach out. Let’s transform data into stories that drive success! 🌟

  17. A

    Data from: California State Waters Map Series--Santa Barbara Channel Web...

    • data.amerigeoss.org
    • data.usgs.gov
    • +3more
    xml
    Updated Aug 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2022). California State Waters Map Series--Santa Barbara Channel Web Services [Dataset]. https://data.amerigeoss.org/dataset/california-state-waters-map-series-santa-barbara-channel-web-services-b23aa
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Aug 23, 2022
    Dataset provided by
    United States
    Area covered
    Santa Barbara Channel
    Description

    In 2007, the California Ocean Protection Council initiated the California Seafloor Mapping Program (CSMP), designed to create a comprehensive seafloor map of high-resolution bathymetry, marine benthic habitats, and geology within California’s State Waters. The program supports a large number of coastal-zone- and ocean-management issues, including the California Marine Life Protection Act (MLPA) (California Department of Fish and Wildlife, 2008), which requires information about the distribution of ecosystems as part of the design and proposal process for the establishment of Marine Protected Areas. A focus of CSMP is to map California’s State Waters with consistent methods at a consistent scale. The CSMP approach is to create highly detailed seafloor maps through collection, integration, interpretation, and visualization of swath sonar data (the undersea equivalent of satellite remote-sensing data in terrestrial mapping), acoustic backscatter, seafloor video, seafloor photography, high-resolution seismic-reflection profiles, and bottom-sediment sampling data. The map products display seafloor morphology and character, identify potential marine benthic habitats, and illustrate both the surficial seafloor geology and shallow (to about 100 m) subsurface geology. It is emphasized that the more interpretive habitat and geology data rely on the integration of multiple, new high-resolution datasets and that mapping at small scales would not be possible without such data. This approach and CSMP planning is based in part on recommendations of the Marine Mapping Planning Workshop (Kvitek and others, 2006), attended by coastal and marine managers and scientists from around the state. That workshop established geographic priorities for a coastal mapping project and identified the need for coverage of “lands” from the shore strand line (defined as Mean Higher High Water; MHHW) out to the 3-nautical-mile (5.6-km) limit of California’s State Waters. Unfortunately, surveying the zone from MHHW out to 10-m water depth is not consistently possible using ship-based surveying methods, owing to sea state (for example, waves, wind, or currents), kelp coverage, and shallow rock outcrops. Accordingly, some of the data presented in this series commonly do not cover the zone from the shore out to 10-m depth. This data is part of a series of online U.S. Geological Survey (USGS) publications, each of which includes several map sheets, some explanatory text, and a descriptive pamphlet. Each map sheet is published as a PDF file. Geographic information system (GIS) files that contain both ESRI ArcGIS raster grids (for example, bathymetry, seafloor character) and geotiffs (for example, shaded relief) are also included for each publication. For those who do not own the full suite of ESRI GIS and mapping software, the data can be read using ESRI ArcReader, a free viewer that is available at http://www.esri.com/software/arcgis/arcreader/index.html (last accessed September 20, 2013). The California Seafloor Mapping Program is a collaborative venture between numerous different federal and state agencies, academia, and the private sector. CSMP partners include the California Coastal Conservancy, the California Ocean Protection Council, the California Department of Fish and Wildlife, the California Geological Survey, California State University at Monterey Bay’s Seafloor Mapping Lab, Moss Landing Marine Laboratories Center for Habitat Studies, Fugro Pelagos, Pacific Gas and Electric Company, National Oceanic and Atmospheric Administration (NOAA, including National Ocean Service–Office of Coast Surveys, National Marine Sanctuaries, and National Marine Fisheries Service), U.S. Army Corps of Engineers, the Bureau of Ocean Energy Management, the National Park Service, and the U.S. Geological Survey. These web services for the Santa Barbara Channel map area includes data layers that are associated to GIS and map sheets available from the USGS CSMP web page at https://walrus.wr.usgs.gov/mapping/csmp/index.html. Each published CSMP map area includes a data catalog of geographic information system (GIS) files; map sheets that contain explanatory text; and an associated descriptive pamphlet. This web service represents the available data layers for this map area. Data was combined from different sonar surveys to generate a comprehensive high-resolution bathymetry and acoustic-backscatter coverage of the map area. These data reveal a range of physiographic including exposed bedrock outcrops, large fields of sand waves, as well as many human impacts on the seafloor. To validate geological and biological interpretations of the sonar data, the U.S. Geological Survey towed a camera sled over specific offshore locations, collecting both video and photographic imagery; these “ground-truth” surveying data are available from the CSMP Video and Photograph Portal at https://doi.org/10.5066/F7J1015K. The “seafloor character” data layer shows classifications of the seafloor on the basis of depth, slope, rugosity (ruggedness), and backscatter intensity and which is further informed by the ground-truth-survey imagery. The “potential habitats” polygons are delineated on the basis of substrate type, geomorphology, seafloor process, or other attributes that may provide a habitat for a specific species or assemblage of organisms. Representative seismic-reflection profile data from the map area is also include and provides information on the subsurface stratigraphy and structure of the map area. The distribution and thickness of young sediment (deposited over the past about 21,000 years, during the most recent sea-level rise) is interpreted on the basis of the seismic-reflection data. The geologic polygons merge onshore geologic mapping (compiled from existing maps by the California Geological Survey) and new offshore geologic mapping that is based on integration of high-resolution bathymetry and backscatter imagery seafloor-sediment and rock samplesdigital camera and video imagery, and high-resolution seismic-reflection profiles. The information provided by the map sheets, pamphlet, and data catalog has a broad range of applications. High-resolution bathymetry, acoustic backscatter, ground-truth-surveying imagery, and habitat mapping all contribute to habitat characterization and ecosystem-based management by providing essential data for delineation of marine protected areas and ecosystem restoration. Many of the maps provide high-resolution baselines that will be critical for monitoring environmental change associated with climate change, coastal development, or other forcings. High-resolution bathymetry is a critical component for modeling coastal flooding caused by storms and tsunamis, as well as inundation associated with longer term sea-level rise. Seismic-reflection and bathymetric data help characterize earthquake and tsunami sources, critical for natural-hazard assessments of coastal zones. Information on sediment distribution and thickness is essential to the understanding of local and regional sediment transport, as well as the development of regional sediment-management plans. In addition, siting of any new offshore infrastructure (for example, pipelines, cables, or renewable-energy facilities) will depend on high-resolution mapping. Finally, this mapping will both stimulate and enable new scientific research and also raise public awareness of, and education about, coastal environments and issues. Web services were created using an ArcGIS service definition file. The ArcGIS REST service and OGC WMS service include all Santa Barbara Channel map area data layers. Data layers are symbolized as shown on the associated map sheets.

  18. Apps with Chinese parents that are still popular in India - Chart

    • restofworld.org
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rest of World (2022). Apps with Chinese parents that are still popular in India - Chart [Dataset]. https://restofworld.org/charts/2022/tF7jU-apps-chinese-parents-popular-india
    Explore at:
    Dataset updated
    Oct 4, 2022
    Dataset authored and provided by
    Rest of World
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    A Rest of World audit of the Google Play store indicated that at least eight of the 100-most downloaded free apps in India may be owned by large Chinese parent companies.

  19. m

    Visualizations of rotational curves within a Standardized Gait Cycle

    • data.mendeley.com
    Updated May 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jürgen Konradi (2022). Visualizations of rotational curves within a Standardized Gait Cycle [Dataset]. http://doi.org/10.17632/m7tbn7vhpf.1
    Explore at:
    Dataset updated
    May 4, 2022
    Authors
    Jürgen Konradi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains graphs and a movie. Both show visualizations of rotational curves in the transversal plane within a Standardized Gait Cycle from Vertebra prominens downwards, ending at the pelvis. They display 201 anonymous healthy people aged 18-70 years walking at 2,3,4, and 5 km/h on a treadmill. They are based on a SPSS (v23) syntax file and a relating graph template that can be found at our datasets as well. Files are numbered subsequently across all speeds and can be linked by number to its non-standardized counterpart in a further dataset. Positive values show vertebral body rotation to the left, negative values show rotation to the right. Percent of the Standardized Gait Cycle (0-100%) is displayed on the abscissa, always starting with Initial Contact of the right foot. Within a Standardized Gait Cycle the duration of the stance phase right is expected to be 60% (Perry, 1992). As can be seen in the graphs, interpolating spline functions work for average walking speed measurements leading to a more precise determination of relevant and characteristic points (e.g. maxima, phase shifts, lumbar and thoracic movement behavior), thereby aiding in in the clarification of individual features.

  20. Scalable ParaView for Extreme Scale Visualization, Phase I

    • data.nasa.gov
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Scalable ParaView for Extreme Scale Visualization, Phase I [Dataset]. https://data.nasa.gov/dataset/Scalable-ParaView-for-Extreme-Scale-Visualization-/up7h-hkky
    Explore at:
    csv, tsv, xml, application/rssxml, application/rdfxml, jsonAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Petscale computing is leading to significant breakthroughs in a number of fields and is revolutionizing the way science is conducted. Data is not knowledge, however, and the challenge has been how to analyze and gain insight from the massive quantities of data that are generated. In order to address the peta-scale visualization challenges, we propose to develop a scientific visualization software that would enable real-time visualization capability of extremely large data sets. We plan to accomplish this by extending the ParaView visualization architecture to extreme scales. ParaView is an open source software installed on all HPC sites including NASA's Pleiades and has a large user base in diverse areas of science and engineering. Our proposed solution will significantly enhance the scientific return from NASA HPC investments by providing the next generation of open source data analysis and visualization tools for very large datasets. To test our solution on real world data with complex pipeline, we have partnered with SciberQuest, who have recently performed the largest kinetic simulations of magnetosphere using 25 K cores on Pleiades and 100 K cores on Kraken. Given that IO is the main bottleneck for scientific visualization at large scales, we propose to work closely with Pleiades's systems team and provide efficient prepackaged general purpose I/O component for ParaView for structured and unstructured data across a spectrum of scales and access patterns with focus on Lustre file system used by Pleiades.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chiticariu Cristian (2021). One Hundred Cities [Dataset]. https://www.kaggle.com/datasets/chiticariucristian/one-hundred-cities/discussion
Organization logo

One Hundred Cities

100 cities from the world with their short description and population

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chiticariu Cristian
Description

Context

100 cities

Content

The dataset consists of one hundred cities around the world, short description for each one, and their population.

Acknowledgements

The data was extracted from https://www.bestcities.org/rankings/worlds-best-cities/

Search
Clear search
Close search
Google apps
Main menu