17 datasets found
  1. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  2. Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

    • zenodo.org
    csv
    Updated Sep 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous authors; Anonymous authors (2023). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.6607065
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous authors; Anonymous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle.

    The data is organized in a table structure. Code4ML includes several main objects: competitions information, raw code blocks collected form Kaggle and manually marked up snippets. Each table has a .csv format.

    Each competition has the text description and metadata, reflecting competition and used dataset characteristics as well as evaluation metrics (competitions.csv). The corresponding datasets can be loaded using Kaggle API and data sources.

    The code blocks themselves and their metadata are collected to the data frames concerning the publishing year of the initial kernels. The current version of the corpus includes two code blocks files: snippets from kernels up to the 2020 year (сode_blocks_upto_20.csv) and those from the 2021 year (сode_blocks_21.csv) with corresponding metadata. The corpus consists of 2 743 615 ML code blocks collected from 107 524 Jupyter notebooks.

    Marked up code blocks have the following metadata: anonymized id, the format of the used data (for example, table or audio), the id of the semantic type, a flag for the code errors, the estimated relevance to the semantic class (from 1 to 5), the id of the parent notebook, and the name of the competition. The current version of the corpus has ~12 000 labeled snippets (markup_data_20220415.csv).

    As marked up code blocks data contains the numeric id of the code block semantic type, we also provide a mapping from this number to semantic type and subclass (actual_graph_2022-06-01.csv).

    The dataset can help solve various problems, including code synthesis from a prompt in natural language, code autocompletion, and semantic code classification.

  3. Z

    Spectra of Earth-like Planets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaltenegger, Lisa (2021). Spectra of Earth-like Planets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4697878
    Explore at:
    Dataset updated
    Apr 23, 2021
    Dataset provided by
    Kaltenegger, Lisa
    Pham, Dang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Earth
    Description

    Spectra data generated for "Color Classification of Earth-like Planets with Machine Learning" (https://academic.oup.com/mnras/advance-article-abstract/doi/10.1093/mnras/stab1144/6247611).

    The flux (units: W/m^2) can be accessed in the flux.pk (pickle file) or flux.csv (comma-separated file). These files also contain the biota information and composition of various surfaces. There are 318,780 spectra generated in total. The spectra contain 6 km cloud layer, Rayleigh scattering. The surface compositions are: cloud, seawater, sand, snow, biota (six kinds), and cloud. They are in 5% resolution for each composition.

    The wavelength (units: micrometer) can be accessed in the wavelength.pk (pickle file) or wavelength.csv (comma-separated file). The wavelength ranges from 0.36 micrometers to 1.1 micrometers, with 1000 sampling points.

    To access the pickle file using Python:

    import pickle import pandas

    load the wavelength dataframe

    wavelength_dataframe = pickle.load(open('wavelength.pk', 'rb'))

    load the fluxes dataframe

    flux_dataframe = pickle.load(open('flux.pk', 'rb'))

    The objects loaded by the pickle files will be Pandas dataframes.

  4. CERNatschool-frame-reader Test Dataset - CSV format

    • figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Whyntie (2023). CERNatschool-frame-reader Test Dataset - CSV format [Dataset]. http://doi.org/10.6084/m9.figshare.674546.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Tom Whyntie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is test dataset taken by a CERN@school Timepix detector in Comma Separated Value (CSV) format. It consists of the frame data for three 256x256 pixel frames, with each frame's data in a separate file. The original binary format data may be found at the figshare link below. The data themselves are the readings from the pixels (X, Y, number of counts) caused by particles incident on the Timepix detector's silicon sensor element when exposed to a potassium chloride source. Three frames were taken with an acquisition time of 60 seconds. Further information may be found on the CERN@school website. A simple frame display (written in Python, with matplotlib) may be found in the Github repository linked to below.

  5. Z

    Data from: A Phanerozoic gridded dataset for palaeogeographic...

    • data.niaid.nih.gov
    • portalcientifico.uvigo.gal
    • +1more
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jones, Lewis A. (2024). A Phanerozoic gridded dataset for palaeogeographic reconstructions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10069221
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset provided by
    Domeier, Mathew
    Jones, Lewis A.
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Description

    This repository provides access to five pre-computed reconstruction files as well as the static polygons and rotation files used to generate them. This set of palaeogeographic reconstruction files provide palaeocoordinates for three global grids at H3 resolutions 2, 3, and 4, which have an average cell spacing of ~316 km, ~119 km, and ~45 km, respectively. Grids were reconstructed at a temporal resolution of one million years throughout the entire Phanerozoic (540–0 Ma). The reconstruction files are stored as comma-separated-value (CSV) files which can be easily read by almost any spreadsheet program (e.g. Microsoft Excel and Google Sheets) or programming language (e.g. Python, Julia, and R). In addition, R Data Serialization (RDS) files—a common format for saving R objects—are also provided as lighter (and compressed) alternatives to the CSV files. The structure of the reconstruction files follows a wide-form data frame structure to ease indexing. Each file consists of three initial index columns relating to the H3 cell index (i.e. the 'H3 address'), present-day longitude of the cell centroid, and the present-day latitude of the cell centroid. The subsequent columns provide the reconstructed longitudinal and latitudinal coordinate pairs for their respective age of reconstruction in ascending order, indicated by a numerical suffix. Each row contains a unique spatial point on the Earth's continental surface reconstructed through time. NA values within the reconstruction files indicate points which are not defined in deeper time (i.e. either the static polygon does not exist at that time, or it is outside the temporal coverage as defined by the rotation file).

    The following five Global Plate Models are provided (abbreviation, temporal coverage, reference) within the GPMs folder:

    WR13, 0–550 Ma, (Wright et al., 2013)

    MA16, 0–410 Ma, (Matthews et al., 2016)

    TC16, 0–540 Ma, (Torsvik and Cocks, 2016)

    SC16, 0–1100 Ma, (Scotese, 2016)

    ME21, 0–1000 Ma, (Merdith et al., 2021)

    In addition, the H3 grids for resolutions 2, 3, and 4 are provided within the grids folder. Finally, we also provide two scripts (python and R) within the code folder which can be used to generate reconstructed coordinates for user data from the reconstruction files.

    For access to the code used to generate these files:

    https://github.com/LewisAJones/PhanGrids

    For more information, please refer to the article describing the data:

    Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. (2024).

    For any additional queries, contact:

    Lewis A. Jones (lewisa.jones@outlook.com) or Mathew M. Domeier (mathewd@uio.no)

    If you use these files, please cite:

    Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. DOI: 10.5281/zenodo.10069221

    References

    Matthews, K. J., Maloney, K. T., Zahirovic, S., Williams, S. E., Seton, M., & Müller, R. D. (2016). Global plate boundary evolution and kinematics since the late Paleozoic. Global and Planetary Change, 146, 226–250. https://doi.org/10.1016/j.gloplacha.2016.10.002.

    Merdith, A. S., Williams, S. E., Collins, A. S., Tetley, M. G., Mulder, J. A., Blades, M. L., Young, A., Armistead, S. E., Cannon, J., Zahirovic, S., & Müller, R. D. (2021). Extending full-plate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic. Earth-Science Reviews, 214, 103477. https://doi.org/10.1016/j.earscirev.2020.103477.

    Scotese, C. R. (2016). Tutorial: PALEOMAP paleoAtlas for GPlates and the paleoData plotter program: PALEOMAP Project, Technical Report.

    Torsvik, T. H., & Cocks, L. R. M. (2017). Earth history and palaeogeography. Cambridge University Press. https://doi.org/10.1017/9781316225523.

    Wright, N., Zahirovic, S., Müller, R. D., & Seton, M. (2013). Towards community-driven paleogeographic reconstructions: Integrating open-access paleogeographic and paleobiology data with plate tectonics. Biogeosciences, 10, 1529–1541. https://doi.org/10.5194/bg-10-1529-2013.

  6. ARtracks - a Global Atmospheric River Catalogue Based on ERA5 and IPART

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominik Traxl; Dominik Traxl (2024). ARtracks - a Global Atmospheric River Catalogue Based on ERA5 and IPART [Dataset]. http://doi.org/10.5281/zenodo.7018725
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dominik Traxl; Dominik Traxl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ARtracks Atmospheric River Catalogue is based on the ERA5 climate reanalysis dataset, specifically the output parameters "vertical integral of east-/northward water vapour flux". Most of the processing relies on
    IPART (Image-Processing based Atmospheric River (AR) Tracking, https://github.com/ihesp/IPART), a Python package for automated AR detection, axis finding and AR tracking. The catalogue is provided as a pickled pandas.DataFrame as well as a CSV file.

    For detailed information, please see https://github.com/dominiktraxl/artracks.

    The ARtracks catalogue covers the years from 1979 to the end of the year 2019.

  7. Z

    Analysis of references in the IPCC AR6 WG2 Report of 2022

    • data.niaid.nih.gov
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Kramer (2022). Analysis of references in the IPCC AR6 WG2 Report of 2022 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6327206
    Explore at:
    Dataset updated
    Mar 11, 2022
    Dataset provided by
    Cameron Neylon
    Bianca Kramer
    License

    https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/

    Description

    This repository contains data on 17,419 DOIs cited in the IPCC Working Group 2 contribution to the Sixth Assessment Report, and the code to link them to the dataset built at the Curtin Open Knowledge Initiative (COKI).

    References were extracted from the report's PDFs (downloaded 2022-03-01) via Scholarcy and exported as RIS and BibTeX files. DOI strings were identified from RIS files by pattern matching and saved as CSV file. The list of DOIs for each chapter and cross chapter paper was processed using a custom Python script to generate a pandas DataFrame which was saved as CSV file and uploaded to Google Big Query.

    We used the main object table of the Academic Observatory, which combines information from Crossref, Unpaywall, Microsoft Academic, Open Citations, the Research Organization Registry and Geonames to enrich the DOIs with bibliographic information, affiliations, and open access status. A custom query was used to join and format the data and the resulting table was visualised in a Google DataStudio dashboard.

    This version of the repository also includes the set of DOIs from references in the IPCC Working Group 1 contribution to the Sixth Assessment Report as extracted by Alexis-Michel Mugabushaka and shared on Zenodo: https://doi.org/10.5281/zenodo.5475442 (CC-BY)

    A brief descriptive analysis was provided as a blogpost on the COKI website.

    The repository contains the following content:

    Data:

    data/scholarcy/RIS/ - extracted references as RIS files

    data/scholarcy/BibTeX/ - extracted references as BibTeX files

    IPCC_AR6_WGII_dois.csv - list of DOIs

    data/10.5281_zenodo.5475442/ - references from IPCC AR6 WG1 report

    Processing:

    preprocessing.R - preprocessing steps for identifying and cleaning DOIs

    process.py - Python script for transforming data and linking to COKI data through Google Big Query

    Outcomes:

    Dataset on BigQuery - requires a google account for access and bigquery account for querying

    Data Studio Dashboard - interactive analysis of the generated data

    Zotero library of references extracted via Scholarcy

    PDF version of blogpost

    Note on licenses: Data are made available under CC0 (with the exception of WG1 reference data, which have been shared under CC-BY 4.0) Code is made available under Apache License 2.0

  8. sustainable-fashion

    • kaggle.com
    Updated Jan 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiyab K. (2025). sustainable-fashion [Dataset]. https://www.kaggle.com/datasets/tiyabk/sustainable-fashion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tiyab K.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sustainable Fashion Q&A Dataset

    This dataset contains a collection of synthetically generated Question-Answer (Q&A) pairs on sustainable fashion and style, with an emphasis on timeless wardrobe pieces, sustainable choices, and capsule wardrobe principles. The data was created using a large language model with advanced reasoning, prompted with various grounded contexts and real-world examples. It can be used to train or evaluate models that specialize in sustainable fashion advice, styling recommendations, or instruction-following tasks.

    Overview

    • Context: The data focuses on classic, long-lasting wardrobe recommendations. Topics include choosing neutral color palettes, selecting high-quality fabrics (like wool), finding universally flattering silhouettes, and embracing sustainability in fashion choices...

    • Structure: Each entry is formatted, containing two primary fields:

      • instruction – The user’s question or prompt
      • response – The corresponding answer or advice
    • Example Entry (Truncated for Clarity): csv instruction,response "What makes a neutral color palette so timeless?", "Neutral tones like black, navy, beige, and gray offer unmatched versatility..."

    Data Generation

    • Synthetic Creation:
      This dataset is synthetic—the questions and answers were generated by a large language model. The prompts used in creation were seeded with diverse real-world fashion contexts and examples to ensure groundedness and practical relevance.

    • Advanced Reasoning:
      The large language model was employed to simulate more detailed and nuanced fashion advice, making each Q&A pair comprehensive yet concise. Despite the synthetic nature, the reasoning incorporates established fashion principles and best practices.

    Dataset Contents

    Column NameDescription
    instructionA concise question related to fashion, style tips, capsule wardrobes, or sustainability.
    responseA short, detailed answer offering timeless styling advice, illustrating best practices in fashion.

    Potential Use Cases

    1. Sustainable Fashion Chatbot/Assistant:

      • Train a model to provide on-demand styling advice or recommendations for various occasions.
    2. Instruction-Following/QA Models:

      • Ideal for fine-tuning large language models (LLMs) so they can handle fashion-specific questions accurately.
    3. Content Generation:

      • Generate blog articles, social media content, or editorial pieces on sustainable and timeless fashion, using the Q&A patterns as seed material.
    4. Sustainable Fashion Product Descriptions:

      • Leverage the dataset to help a model create consistent, on-brand descriptions for apparel and accessories.

    Getting Started

    1. Download the Dataset

      • The data is provided as a csv file where each line is a single record with the keys instruction and response.
    2. Data Preprocessing

      • Many Q&A or instruction-based fine-tuning frameworks allow direct ingestion of CSV files.
      • Alternatively, convert the data into your preferred format ( Pandas DataFrame, etc.) for custom processing.
    3. Sample Use
      ```python import csv

    Load the data

    data = [] with open('sustainable_fashion.csv', 'r', encoding='utf-8') as f: reader = csv.DictReader(f) for row in reader: data.append(row)

    Example: Print the first Q&A

    print("Question:", data[0]['instruction']) print("Answer:", data[0]['response']) ```

    1. Model Fine-Tuning
      • If using a language model (e.g., Gemma-style), you can structure each entry with a prompt and desired response.
      • Incorporate additional context like a system message:
        You are a fashion advisor. Provide concise, accurate style guidance.

    Tips for Best Results

    • Maintain Consistency:

      • When fine-tuning, keep the format of instruction and response consistent. Models often learn better with clearly defined roles.
    • Supplementary Data:

      • If your application requires broader knowledge (e.g., fashion trends or brand-specific info), consider augmenting this dataset with additional Q&A examples or general fashion text data.
    • Evaluate Quality:

      • Periodically check the model’s responses using domain experts or user feedback. Adjust or expand the dataset if you notice gaps in the model’s understanding.
    • Ethical and Inclusive Language:

      • Fashion advice can intersect with body image and cultural preferences. Ensure your final application provides inclusive and considerate guidance.
  9. A

    ‘Austin's data portal activity metrics’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Austin's data portal activity metrics’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-austin-s-data-portal-activity-metrics-1ce3/1b069fcb/?iid=059-557&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Austin
    Description

    Analysis of ‘Austin's data portal activity metrics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/data-portal-activity-metricse on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Background

    Austin's open data portal provides lots of public data about the City of Austin. It also provides portal administrators with behind-the-scenes information about how the portal is used... but that data is mysterious, hard to handle in a spreadsheet, and not located all in one place.

    Until now! Authorized city staff used admin credentials to grab this usage data and share it the public. The City of Austin wants to use this data to inform the development of its open data initiative and manage the open data portal more effectively.

    This project contains related datasets for anyone to explore. These include site-level metrics, dataset-level metrics, and department information for context. A detailed detailed description of how the files were prepared (along with code) can be found on github here.

    Example questions to answer about the data portal

    1. What parts of the open data portal do people seem to value most?
    2. What can we tell about who our users are?
    3. How are our data publishers doing?
    4. How much data is published programmatically vs manually?
    5. How data is super fresh? Super stale?
    6. Whatever you think we should know...

    About the files

    all_views_20161003.csv

    There is a resource available to portal administrators called "Dataset of datasets". This is the export of that resource, and it was captured on Oct 3, 2016. It contains a summary of the assets available on the data portal. While this file contains over 1400 resources (such as views, charts, and binary files), only 363 are actual tabular datasets.

    table_metrics_ytd.csv

    This file contains information about the 363 tabular datasets on the portal. Activity metrics for an individual dataset can be accessed by calling Socrata's views/metrics API and passing along the dataset's unique ID, a time frame, and admin credentials. The process of obtaining the 363 identifiers, calling the API, and staging the information can be reviewed in the python notebook here.

    site_metrics.csv

    This file is the export of site-level stats that Socrata generates using a given time frame and grouping preference. This file contains records about site usage each month from Nov 2011 through Sept 2016. By the way, it contains 285 columns... and we don't know what many of them mean. But we are determined to find out!! For a preliminary exploration of the columns and what portal-related business processes to which they might relate, check out the notes in this python notebook here

    city_departments_in_current_budget.csv

    This file contains a list of all City of Austin departments according to how they're identified in the most recently approved budget documents. Could be helpful for getting to know more about who the publishers are.

    crosswalk_to_budget_dept.csv

    The City is in the process of standardizing how departments identify themselves on the data portal. In the meantime, here's a crosswalk from the department values observed in all_views_20161003.csv to the department names that appear in the City's budget

    This dataset was created by Hailey Pate and contains around 100 samples along with Di Sync Success, Browser Firefox 19, technical information and other features such as: - Browser Firefox 33 - Di Sync Failed - and more.

    How to use this dataset

    • Analyze Sf Query Error User in relation to Js Page View Admin
    • Study the influence of Browser Firefox 37 on Datasets Created
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Hailey Pate

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  10. f

    Dataset supporting Figure 3 in "Pan-cortical 2-photon mesoscopic imaging and...

    • plus.figshare.com
    bin
    Updated Feb 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evan Vickers; David A. McCormick (2024). Dataset supporting Figure 3 in "Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice" [Dataset]. http://doi.org/10.25452/figshare.plus.25114463.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 1, 2024
    Dataset provided by
    Figshare+
    Authors
    Evan Vickers; David A. McCormick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains data from a single 1-photon widefield imaging experiment and a single Thorlabs mesoscope 2-photon imaging session from the same side mount mouse, corresponding to panels a-c and d of Figure 3, respectively. Included files contain imaging data, behavioral data, and python files with combined neurobehavioral data.Note that session names have the following format: "mouse#_bigEndianDate_cage#_info_session#_attempt#".Raw mesoscope imaging data is included in ScanImage rendered format as single big tiffs with the following nomenclature: "filename_2D.tiff".Mouse face and body cam images are included as standalone or concatenated .avi movie files, and behavioral data is included both as Spike2 files (smrx) and in exported form at Matlab data files (.mat).In all cases the first frame of the 2-photon movie, the right face/body movie, and Spike2 data are aligned to the first Labview-issued frameclock trigger (also recorded in Spike2, along with all other frameclock events). 2-photon triggers were sometimes incorrectly recorded in Spike2 (generally we recorded these as both events and waveforms), but were in all cases additionally exported from ScanImage tiff metadeta as timestamps (csv files ending in header.csv). Session start-time timestamps, also exported from ScanImage tiff metadata, appear as .txt files ending in "_starttime.txt".Preprocessed data (python) can be found in npy files with various names, each containing different subsets of variables relevant to the analysis. For each session, the npy file containing the string "standard_frames" contains the most complete, final stage set of preprocessed neurobehavioral data (in combined DataFrame format, exportable to nwb), including CCF/MMM alignments. The file containing the string "nb_dump" contains a large set of auxilliary variables that may be needed for additional preprocessing.Additional image files (tiff, png) and excel worksheets (xlsx, csv) containing high-level data summaries and records of intermediate analysis steps are also included.Please contact the authors for any additional clarifications as needed.See related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.7052513

  11. f

    Dataset supporting Figure S6_E235 in "Pan-cortical 2-photon mesoscopic...

    • plus.figshare.com
    bin
    Updated Feb 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evan Vickers; David A. McCormick (2024). Dataset supporting Figure S6_E235 in "Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice" [Dataset]. http://doi.org/10.25452/figshare.plus.25115220.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 1, 2024
    Dataset provided by
    Figshare+
    Authors
    Evan Vickers; David A. McCormick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains data from a single 1-photon widefield imaging experiment and a single Thorlabs mesoscope 2-photon imaging session from the same side mount mouse as in Figures 5 and 6 (but a different session than in those Figures), corresponding to panels d-f of Figure S6. Included files contain imaging data, behavioral data, and python files with combined neurobehavioral data. Additional python files corresponding to this session can be found in the FigShare+ folder in this collection corresponding to Figure 6 (this session was used in the BSOiD model training for the session that was fit in Figure 6).Note that session names have the following format: "mouse#_bigEndianDate_cage#_info_session#_attempt#".Raw mesoscope imaging data is included in ScanImage rendered format as single big tiffs with the following nomenclature: "filename_2D.tiff".Mouse face and body cam images are included as standalone or concatenated .avi movie files, and behavioral data is included both as Spike2 files (smrx) and in exported form at Matlab data files (.mat).In all cases the first frame of the 2-photon movie, the right face/body movie, and Spike2 data are aligned to the first Labview-issued frameclock trigger (also recorded in Spike2, along with all other frameclock events). 2-photon triggers were sometimes incorrectly recorded in Spike2 (generally we recorded these as both events and waveforms), but were in all cases additionally exported from ScanImage tiff metadeta as timestamps (csv files ending in header.csv). Session start-time timestamps, also exported from ScanImage tiff metadata, appear as .txt files ending in "_starttime.txt".Preprocessed data (python) can be found in npy files with various names, each containing different subsets of variables relevant to the analysis. For each session, the npy file containing the string "standard_frames" contains the most complete, final stage set of preprocessed neurobehavioral data (in combined DataFrame format, exportable to nwb), including CCF/MMM alignments. The file containing the string "nb_dump" contains a large set of auxilliary variables that may be needed for additional preprocessing.Additional image files (tiff, png) and excel worksheets (xlsx, csv) containing high-level data summaries and records of intermediate analysis steps are also included.Please contact the authors for any additional clarifications as needed.See related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.7052513

  12. SELTO Dataset

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated May 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch (2023). SELTO Dataset [Dataset]. http://doi.org/10.5281/zenodo.7034899
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Benchmark Dataset for Deep Learning-based Methods for 3D Topology Optimization.

    One can find a description of the provided dataset partitions in Section 3 of Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.


    Every dataset container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and a corresponding binarized SIMP solution. Every file of the form {i}.csv contains all voxel-wise information about the sample i. Every file of the form {i}_info.csv file contains scalar parameters of the topology optimization problem, such as material parameters.


    This dataset represents topology optimization problems and solutions on the bases of voxels. We define all spatially varying quantities via the voxels' centers -- rather than via the vertices or surfaces of the voxels.
    In {i}.csv files, each row corresponds to one voxel in the design space. The columns correspond to ['x', 'y', 'z', 'design_space', 'dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density'].

    • x, y, z - These are three integer indices stating the index/location of the voxel within the voxel mesh.
    • design_space - This is one ternary variable indicating the type of material density constraint on the voxel within the TO problem formulation. "0" and "1" indicate a material density fixed at 0 or 1, respectively. "-1" indicates the absence of constraints.
    • dirichlet_x, dirichlet_y, dirichlet_z - These are three binary variables defining whether the voxel contains homogenous Dirichlet constraints in the respective axis direction.
    • force_x, force_y, force_z - These are three floating point variables giving the three spacial components of the forces applied to each voxel. All forces are body forces given in [N/m^3].
    • density - This is a binary variable stating whether the voxel carries material in the solution of the topology optimization problem.

    Any of these files with the index i can be imported using pandas by executing:

    import pandas as pd
    
    directory = ...
    file_path = f'{directory}/{i}.csv'
    column_names = ['x', 'y', 'z', 'design_space','dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density']
    data = pd.read_csv(file_path, names=column_names)

    From this pandas dataframe one can extract the torch tensors of forces F, Dirichlet conditions ωDirichlet, and design space information ωdesign using the following functions:

    import torch
    
    def get_shape_and_voxels(data):
      shape = data[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1
      vox_x = data['x'].values
      vox_y = data['y'].values
      vox_z = data['z'].values
      voxels = [vox_x, vox_y, vox_z]
      return shape, voxels
    
    
    def get_forces_boundary_conditions_and_design_space(data, shape, voxels):
      F = torch.zeros(3, *shape, dtype=torch.float32)
      F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_x'].values, dtype=torch.float32)
      F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_y'].values, dtype=torch.float32)
      F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_z'].values, dtype=torch.float32)
    
      ω_Dirichlet = torch.zeros(3, *shape, dtype=torch.float32)
      ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_x'].values, dtype=torch.float32)
      ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_y'].values, dtype=torch.float32)
      ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_z'].values, dtype=torch.float32)
    
      ω_design = torch.zeros(1, *shape, dtype=int)
      ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['design_space'].values.astype(int))
      return F, ω_Dirichlet, ω_design

    The corresponding {i}_info.csv files only have one row with column labels ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z'].

    • E - Young's modulus [Pa]
    • ν - Poisson's ratio [-]
    • σ_ys - Yield stress [Pa]
    • vox_size - Length of the edge of a (cube-shaped) voxel [m]
    • p_x, p_y, p_z - Location of the root of the design space [m]

    Analogously to above, one can import any {i}_info.csv file by executing:

    file_path = f'{directory}/{i}_info.csv'
    data_info_column_names = ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z']
    data_info = pd.read_csv(file_path, names=data_info_column_names)

  13. u

    Data from: Algoritmo de detección de odio en español (Algorithm for...

    • produccioncientifica.ucm.es
    • investigacion.unir.net
    • +1more
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel; Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel (2024). Algoritmo de detección de odio en español (Algorithm for detection of hate speech in Spanish) [Dataset]. https://produccioncientifica.ucm.es/documentos/67321c66aea56d4af0483395
    Explore at:
    Dataset updated
    2024
    Authors
    Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel; Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel
    Description

    Algoritmo de la detección de expresiones de odio en español. Este algoritmo fue desarrollado en el marco del proyecto Hatemedia (PID2020-114584GB-I00), financiado por MCIN/AEI /10.13039/501100011033, con la colaboración de Possible Inc.Algoritmo desarrollado en el marco del proyecto Hatemedia (PID2020-114584GB-I00), financiado por MCIN/ AEI /10.13039/501100011033La estructura de carpetas con la documentación de Github es la presentada a continuación:02 Documentación Github└── 00_Odio y no odio├── DOCUMENTACIÓN GITHUB.docx├── ejemplo (1).py├── Modelo_binario_ (1) (1).ipynb├── obtener_caracteristicas (1).py└── Recursos-20231027T110710Z-001 (1).zip
    Se detalla a continuación el contenido de cada fichero:- DOCUMENTACIÓN GITHUB.docx:Informe en el que se presenta el uso de los scripts ejemplo (1).py y obtener_caracteristicas (1).py para emplear los modelos.- ejemplo (1).py: Script Python que muestra el uso de los modelos para realizar predicciones.Modelo_binario_(1) (1).ipnyb:Notebook con el código utilizado para el entrenamiento de los distintos modelos.Obtener_caracteristicas (1).py: Script Python con las funciones de preprocesado utilizadas previamente al uso de los modelos para predecir las entradas de un dataframe.Recursos-20231027T110710Z-001 (1).zip:La carpeta recursos contiene 3 .csv utilizados en la extracción de características.El dataset que se ha utilizado para el entrenamiento de los modelos es dataset_completo_caracteristicas_ampliadas_todas_combinaciones_v1_textoProcesado.csv(https://acortar.link/diSV7o)El Algoritmo se desarrolló, a partir de las pruebas de modelos aplicados que se muestran a continuación:MODELOS├── 70-30│ ├── CART_binario_70-30.joblib│ ├── GB_binario_70-30.joblib│ ├── MLP_binario_70-30.joblib│ ├── NB_binario_70-30.joblib│ ├── RF_binario_70-30.joblib│ └── SVM_binario_70-30.joblib├── 80-20│ ├── CART_binario_80-20.joblib│ ├── GB_binario_80-20.joblib│ ├── MLP_binario_80-20.joblib│ ├── NB_binario_80-20.joblib│ ├── RF_binario_80-20.joblib│ └── SVM_binario_80-20.joblib└── 90-10├── CART_binario_90-10.joblib├── GB_binario_90-10.joblib├── MLP_binario_90-10.joblib├── NB_binario_90-10.joblib├── RF_binario_90-10.joblib└── SVM_binario_90-10.joblibEn las carpetas 70-30, 80-20 y 90-10 podemos encontrar los distintos modelos ya entrenados con los respectivos porcentajes de train y test.Se comparte resultados y comparativas generados durante el proceso de entrenamiento y validación de modelo final usado para el desarrollo del algoritmo, la carpeta MODELOS (subido en Github), y en documento Comparativa_V2.xlsx (subido en github).El procedimiento seguido para realizar el entrenamiento de los modelos queda reflejado en el Informe técnico desarrollo de algoritmo de clasificación de odio/no odio en medios informativos digitales españoles en X (Twitter), Facebook y portales web (https://doi.org/10.6084/m9.figshare.26085688.v1).Autores:Elias Said-HungJulio Montero-DíazOscar De Gregorio- Almudena RuizXiomara BlancoJuan José CubillasDaniel Pérez PalauFinanciado por:Agencia Estatal de Investigación – Ministerio de Ciencia e InnovaciónCon el apoyo de:- POSSIBLE S.L.Como citar: Said-Hung, E., Montero-Diaz, J., De Gregorio Vicente, O., Ruiz-Iniesta, A., Blanco Valencia, X., José Cubillas, J., and Pérez Palau, D. (2023), “Algorithm for classifying hate expressions in Spanish”, figshare. https://doi.org/10.6084/m9.figshare.24574906.Más información:- https://www.hatemedia.es/ o contactar con: elias.said@unir.net----Algorithm for detection of hate expressions in Spanish. This algorithm was developed within the framework of the Hatemedia project (PID2020-114584GB-I00), funded by MCIN/ AEI /10.13039/501100011033, with the collaboration of Possible Inc.Algorithm developed within the framework of the Hatemedia project (PID2020-114584GB-I00), funded by MCIN/ AEI /10.13039/501100011033The folder structure with the GitHub documentation is presented below:02 Documentación Github└── 00_Odio y no odio├── DOCUMENTACIÓN GITHUB.docx├── ejemplo (1).py├── Modelo_binario_ (1) (1).ipynb├── obtener_caracteristicas (1).py└── Recursos-20231027T110710Z-001 (1).zipThe content of each file is detailed below:DOCUMENTACIÓN GITHUB.docx: Report that presents the example of the script (1).py and get_characteristics (1).py to use the models.- ejemplo (1).py: Python script showing the use of models to make predictions.Modelo_binario_(1) (1).ipnyb: This is a notebook with the code used to train the different models.Obtener_caracteristicas (1).py: Python script with the preprocessing functions used before using the models to predict the inputs of a data frame.Recursos-20231027T110710Z-001 (1).zip: The resources folder contains 3 .csv used in feature extraction.The dataset that has been used for training the models is dataset_completo_caracteristicas_ampliadas_todos_combinaciones_v1_textoProcesado.csv (https://acortar.link/diSV7o)The Algorithm was developed from the tests of applied models shown below:MODELS├── 70-30│ ├── CART_binario_70-30.joblib│ ├── GB_binario_70-30.joblib│ ├── MLP_binario_70-30.joblib│ ├── NB_binario_70-30.joblib│ ├── RF_binario_70-30.joblib│ └── SVM_binario_70-30.joblib├── 80-20│ ├── CART_binario_80-20.joblib│ ├── GB_binario_80-20.joblib│ ├── MLP_binario_80-20.joblib│ ├── NB_binario_80-20.joblib│ ├── RF_binario_80-20.joblib│ └── SVM_binario_80-20.joblib└── 90-10├── CART_binario_90-10.joblib├── GB_binario_90-10.joblib├── MLP_binario_90-10.joblib├── NB_binario_90-10.joblib├── RF_binario_90-10.joblib└── SVM_binario_90-10.joblibIn folders 70-30, 80-20 and 90-10, we can find the different models already trained with the respective percentages of train and test.Results and comparisons generated during the training and validation process of the final model used for the algorithm's development are shared in the MODELS folder (uploaded on Github) and in the document Comparativa_V2.xlsx (uploaded on GitHub).The procedure for training the models is reflected in the Technical report development of hate/non-hate classification algorithm in Spanish digital news media on X (Twitter), Facebook and web portals (https://doi.org/10.6084/m9.figshare.26085688.v1).The dataset used for training is dataset_completo_caracteristicas_ampliadas_todas_combinaciones_v1_textoProcesado.csv (https://acortar.link/diSV7o)As documentation, in the folder "02 Documentación Github/00_Odio y no odio", the report "DOCUMENTACIÓN GITHUB.docx" explains the use of the different training models for making predictions.Authors:Elias Said-HungJulio Montero-DíazOscar De GregorioAlmudena Ruiz- Xiomara BlancoJuan José CubillasDaniel Pérez PalauFunded by: State Research Agency – Ministry of Science and InnovationWith the support of:- POSSIBLE S.L.How to cites: Said-Hung, E., Montero-Diaz, J., De Gregorio Vicente, O., Ruiz-Iniesta, A., Blanco Valencia, X., José Cubillas, J., and Pérez Palau, D. (2023), “Algorithm for classifying hate expressions in Spanish”, figshare. https://doi.org/10.6084/m9.figshare.24574906.More information:- https://www.hatemedia.es/ or contact: elias.said@unir.net

  14. 3D skeletons UP-Fall Dataset

    • zenodo.org
    zip
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tresor KOFFI; Tresor KOFFI (2024). 3D skeletons UP-Fall Dataset [Dataset]. http://doi.org/10.5281/zenodo.12773013
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tresor KOFFI; Tresor KOFFI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 18, 2024
    Description

    3D skeletons UP-Fall Dataset

    Different between Fall and Impact detection

    Overview

    This dataset aims to facilitate research in fall detection, particularly focusing on the precise detection of impact moments within fall events. The 3D skeletons data accuracy and comprehensiveness make it a valuable resource for developing and benchmarking fall detection algorithms. The dataset contains 3D skeletal data extracted from fall events and daily activities of 5 subjects performing fall scenarios

    Data Collection

    The skeletal data was extracted using a pose estimation algorithm, which processes images frames to determine the 3D coordinates of each joint. Sequences with less than 100 frames of extracted data were excluded to ensure the quality and reliability of the dataset. As a result, some subjects may have fewer CSV files.

    CSV Structure

    The data is organized by subjects, and each subject contains CSV files named according to the pattern C1S1A1T1, where:

    • C: Camera (1 or 2)
    • S: Subject (1 to 5)
    • A: Activity (1 to N, representing different activities)
    • T: Trial (1 to 3)

    subject1/`: Contains CSV files for Subject 1.

    • C1S1A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 1
    • C1S1A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 1
    • C1S1A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 1
    • C2S1A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 1
    • C2S1A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 1
    • C2S1A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 1

    subject2/`: Contains CSV files for Subject 2.

    • C1S2A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 2
    • C1S2A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 2
    • C1S2A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 2
    • C2S2A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 2
    • C2S2A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 2
    • C2S2A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 2

    subject3/, subject4/, subject5/: Similar structure as above, but may contain fewer CSV files due to the data extraction criteria mentioned above.

    Column Descriptions

    Each CSV file contains the following columns representing different skeletal joints and their respective coordinates in 3D space:

    Column Name

    Description

    joint_1_x

    X coordinate of joint 1

    joint_1_y

    Y coordinate of joint 1

    joint_1_z

    Z coordinate of joint 1

    joint_2_x

    X coordinate of joint 2

    joint_2_y

    Y coordinate of joint 2

    joint_2_z

    Z coordinate of joint 2

    ...

    ...

    joint_n_x

    X coordinate of joint n

    joint_n_y

    Y coordinate of joint n

    joint_n_z

    Z coordinate of joint n

    LABEL

    Label indicating impact (1) or non-impact (0)

    Example

    Here is an example of what a row in one of the CSV files might look like:

    joint_1_x

    joint_1_y

    joint_1_z

    joint_2_x

    joint_2_y

    joint_2_z

    ...

    joint_n_x

    joint_n_y

    joint_n_33

    LABEL

    0.123

    0.456

    0.789

    0.234

    0.567

    0.890

    ...

    0.345

    0.678

    0.901

    0

    Usage

    This data can be used for developing and benchmarking impact fall detection algorithms. It provides detailed information on human posture and movement during falls, making it suitable for machine learning and deep learning applications in impact fall detection and prevention.

    Using github


    1. Clone the repository:

    -bash
    git clone

    https://github.com/Tresor-Koffi/3D_skeletons-UP-Fall-Dataset


    2. Navigate to the directory:

    -bash
    -cd 3D_skeletons-UP-Fall-Dataset

    Examples

    Here's a simple example of how to load and inspect a sample data file using Python:
    ```python
    import pandas as pd

    # Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

    data = pd.read_csv('subject1/C1S1A1T1.csv')
    print(data.head())

  15. e

    Dataset with four years of condition monitoring technical language...

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luleå Tekniska universitet, Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-5878-hafd-ms27~~1?locale=bg
    Explore at:
    unknownAvailable download formats
    Dataset authored and provided by
    Luleå Tekniska universitet
    Area covered
    Sverige
    Description

    Detta dataset består av tekniskt-språk-annoteringar från fyra års insamling från två pappersmaskiner i norra Sverige, strukturerat som en Pandas dataframe. Samma data finns också tillgänglig som en semikolonseparerad .csv-fil. Datan består av två kolumner, där den första kolumnen motsvarar annoteringens textinnehåll, och den andra titeln. Annoteringarna är skrivna på svenska, och processade så att alla egennamn ersatts av textsträngen ’egennamn’. Varje rad motsvarar en annotering med titel.

    Data behandlas i Python med: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']

  16. 1117 Russian cities with city name, region, geographic coordinates and 2020...

    • zenodo.org
    • explore.openaire.eu
    csv
    Updated Aug 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov (2021). 1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate [Dataset]. http://doi.org/10.5281/zenodo.5151423
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 6, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Russia
    Description

    1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.

    How to use

    from pathlib import Path
    import requests
    import pandas as pd
    
    url = ("https://raw.githubusercontent.com/"
       "epogrebnyak/ru-cities/main/assets/towns.csv")
    
    # save file locally
    p = Path("towns.csv")
    if not p.exists():
      content = requests.get(url).text
      p.write_text(content, encoding="utf-8")
    
    # read as dataframe
    df = pd.read_csv("towns.csv")
    print(df.sample(5))

    Files:

    Сolumns (towns.csv):

    Basic info:

    • city - city name (several cities have alternative names marked in alt_city_names.json)
    • population - city population, thousand people, Rosstat estimate as of 1.1.2020
    • lat,lon - city geographic coordinates

    Region:

    • region_name - subnational region (oblast, republic, krai or AO)
    • region_iso_code - ISO 3166 code, eg RU-VLD
    • federal_district, eg Центральный

    City codes:

    • okato
    • oktmo
    • fias_id
    • kladr_id

    Data sources

    Comments

    City groups

    • Ханты-Мансийский and Ямало-Ненецкий autonomous regions excluded to avoid duplication as parts of Тюменская область.

    • Several notable towns are classified as administrative part of larger cities (Сестрорецк is a municpality at Saint-Petersburg, Щербинка part of Moscow). They are not and not reported in this dataset.

    By individual city

    Alternative city names

    • We suppressed letter "ё" city columns in towns.csv - we have Орел, but not Орёл. This affected:

      • Белоозёрский
      • Королёв
      • Ликино-Дулёво
      • Озёры
      • Щёлково
      • Орёл
    • Дмитриев and Дмитриев-Льговский are the same city.

    assets/alt_city_names.json contains these names.

    Tests

    poetry install
    poetry run python -m pytest
    

    How to replicate dataset

    1. Base dataset

    Run:

    • download data stro rar/get.sh
    • convert Саратовская область.doc to docx
    • run make.py

    Creates:

    • _towns.csv
    • assets/regions.csv

    2. API calls

    Note: do not attempt if you do not have to - this runs a while and loads third-party API access.

    You have the resulting files in repo, so probably does not need to these scripts.

    Run:

    • cd geocoding
    • run coord_dadata.py (needs token)
    • run coord_osm.py

    Creates:

    • coord_dadata.csv
    • coord_osm.csv

    3. Merge data

    Run:

    • run merge.py

    Creates:

    • assets/towns.csv

  17. f

    Additional file 6: of Gene overlapping and size constraints in the viral...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadav Brandes; Michal Linial (2023). Additional file 6: of Gene overlapping and size constraints in the viral world [Dataset]. http://doi.org/10.6084/m9.figshare.c.3631700_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Nadav Brandes; Michal Linial
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    VIPERdb clean data. Contains structural data about the capsids of icosahedral viral genera, as taken from VIPERdb after merging together records of the same genus (see Methods). Rename this file to “viperdb_clean.csv†in order to load it through our Python framework. (CSV 6 kb)

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Organization logo

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema 

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{
  _id: 
Search
Clear search
Close search
Google apps
Main menu