17 datasets found

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
explore.openaire.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...
zenodo.org
csv
Updated Sep 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous authors; Anonymous authors (2023). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.6607065
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6607065
Dataset updated
Sep 15, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous authors; Anonymous authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle.

The data is organized in a table structure. Code4ML includes several main objects: competitions information, raw code blocks collected form Kaggle and manually marked up snippets. Each table has a .csv format.

Each competition has the text description and metadata, reflecting competition and used dataset characteristics as well as evaluation metrics (competitions.csv). The corresponding datasets can be loaded using Kaggle API and data sources.

The code blocks themselves and their metadata are collected to the data frames concerning the publishing year of the initial kernels. The current version of the corpus includes two code blocks files: snippets from kernels up to the 2020 year (сode_blocks_upto_20.csv) and those from the 2021 year (сode_blocks_21.csv) with corresponding metadata. The corpus consists of 2 743 615 ML code blocks collected from 107 524 Jupyter notebooks.

Marked up code blocks have the following metadata: anonymized id, the format of the used data (for example, table or audio), the id of the semantic type, a flag for the code errors, the estimated relevance to the semantic class (from 1 to 5), the id of the parent notebook, and the name of the competition. The current version of the corpus has ~12 000 labeled snippets (markup_data_20220415.csv).

As marked up code blocks data contains the numeric id of the code block semantic type, we also provide a mapping from this number to semantic type and subclass (actual_graph_2022-06-01.csv).

The dataset can help solve various problems, including code synthesis from a prompt in natural language, code autocompletion, and semantic code classification.
Z
Spectra of Earth-like Planets
data.niaid.nih.gov
zenodo.org
Updated Apr 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaltenegger, Lisa (2021). Spectra of Earth-like Planets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4697878
Explore at:
Dataset updated
Apr 23, 2021
Dataset provided by
Kaltenegger, Lisa
Pham, Dang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Earth
Description
Spectra data generated for "Color Classification of Earth-like Planets with Machine Learning" (https://academic.oup.com/mnras/advance-article-abstract/doi/10.1093/mnras/stab1144/6247611).

The flux (units: W/m^2) can be accessed in the flux.pk (pickle file) or flux.csv (comma-separated file). These files also contain the biota information and composition of various surfaces. There are 318,780 spectra generated in total. The spectra contain 6 km cloud layer, Rayleigh scattering. The surface compositions are: cloud, seawater, sand, snow, biota (six kinds), and cloud. They are in 5% resolution for each composition.

The wavelength (units: micrometer) can be accessed in the wavelength.pk (pickle file) or wavelength.csv (comma-separated file). The wavelength ranges from 0.36 micrometers to 1.1 micrometers, with 1000 sampling points.

To access the pickle file using Python:

import pickle import pandas

load the wavelength dataframe

wavelength_dataframe = pickle.load(open('wavelength.pk', 'rb'))

load the fluxes dataframe

flux_dataframe = pickle.load(open('flux.pk', 'rb'))

The objects loaded by the pickle files will be Pandas dataframes.
CERNatschool-frame-reader Test Dataset - CSV format
figshare.com
txt
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom Whyntie (2023). CERNatschool-frame-reader Test Dataset - CSV format [Dataset]. http://doi.org/10.6084/m9.figshare.674546.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.674546.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Tom Whyntie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is test dataset taken by a CERN@school Timepix detector in Comma Separated Value (CSV) format. It consists of the frame data for three 256x256 pixel frames, with each frame's data in a separate file. The original binary format data may be found at the figshare link below. The data themselves are the readings from the pixels (X, Y, number of counts) caused by particles incident on the Timepix detector's silicon sensor element when exposed to a potassium chloride source. Three frames were taken with an acquisition time of 60 seconds. Further information may be found on the CERN@school website. A simple frame display (written in Python, with matplotlib) may be found in the Github repository linked to below.
Z
Data from: A Phanerozoic gridded dataset for palaeogeographic...
data.niaid.nih.gov
portalcientifico.uvigo.gal
+1more
Updated May 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jones, Lewis A. (2024). A Phanerozoic gridded dataset for palaeogeographic reconstructions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10069221
Explore at:
Dataset updated
May 29, 2024
Dataset provided by
Domeier, Mathew
Jones, Lewis A.
License
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Description
This repository provides access to five pre-computed reconstruction files as well as the static polygons and rotation files used to generate them. This set of palaeogeographic reconstruction files provide palaeocoordinates for three global grids at H3 resolutions 2, 3, and 4, which have an average cell spacing of ~316 km, ~119 km, and ~45 km, respectively. Grids were reconstructed at a temporal resolution of one million years throughout the entire Phanerozoic (540–0 Ma). The reconstruction files are stored as comma-separated-value (CSV) files which can be easily read by almost any spreadsheet program (e.g. Microsoft Excel and Google Sheets) or programming language (e.g. Python, Julia, and R). In addition, R Data Serialization (RDS) files—a common format for saving R objects—are also provided as lighter (and compressed) alternatives to the CSV files. The structure of the reconstruction files follows a wide-form data frame structure to ease indexing. Each file consists of three initial index columns relating to the H3 cell index (i.e. the 'H3 address'), present-day longitude of the cell centroid, and the present-day latitude of the cell centroid. The subsequent columns provide the reconstructed longitudinal and latitudinal coordinate pairs for their respective age of reconstruction in ascending order, indicated by a numerical suffix. Each row contains a unique spatial point on the Earth's continental surface reconstructed through time. NA values within the reconstruction files indicate points which are not defined in deeper time (i.e. either the static polygon does not exist at that time, or it is outside the temporal coverage as defined by the rotation file).

The following five Global Plate Models are provided (abbreviation, temporal coverage, reference) within the GPMs folder:

WR13, 0–550 Ma, (Wright et al., 2013)

MA16, 0–410 Ma, (Matthews et al., 2016)

TC16, 0–540 Ma, (Torsvik and Cocks, 2016)

SC16, 0–1100 Ma, (Scotese, 2016)

ME21, 0–1000 Ma, (Merdith et al., 2021)

In addition, the H3 grids for resolutions 2, 3, and 4 are provided within the grids folder. Finally, we also provide two scripts (python and R) within the code folder which can be used to generate reconstructed coordinates for user data from the reconstruction files.

For access to the code used to generate these files:

https://github.com/LewisAJones/PhanGrids

For more information, please refer to the article describing the data:

Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. (2024).

For any additional queries, contact:

Lewis A. Jones (lewisa.jones@outlook.com) or Mathew M. Domeier (mathewd@uio.no)

If you use these files, please cite:

Jones, L.A. and Domeier, M.M. 2024. A Phanerozoic gridded dataset for palaeogeographic reconstructions. DOI: 10.5281/zenodo.10069221

References

Matthews, K. J., Maloney, K. T., Zahirovic, S., Williams, S. E., Seton, M., & Müller, R. D. (2016). Global plate boundary evolution and kinematics since the late Paleozoic. Global and Planetary Change, 146, 226–250. https://doi.org/10.1016/j.gloplacha.2016.10.002.

Merdith, A. S., Williams, S. E., Collins, A. S., Tetley, M. G., Mulder, J. A., Blades, M. L., Young, A., Armistead, S. E., Cannon, J., Zahirovic, S., & Müller, R. D. (2021). Extending full-plate tectonic models into deep time: Linking the Neoproterozoic and the Phanerozoic. Earth-Science Reviews, 214, 103477. https://doi.org/10.1016/j.earscirev.2020.103477.

Scotese, C. R. (2016). Tutorial: PALEOMAP paleoAtlas for GPlates and the paleoData plotter program: PALEOMAP Project, Technical Report.

Torsvik, T. H., & Cocks, L. R. M. (2017). Earth history and palaeogeography. Cambridge University Press. https://doi.org/10.1017/9781316225523.

Wright, N., Zahirovic, S., Müller, R. D., & Seton, M. (2013). Towards community-driven paleogeographic reconstructions: Integrating open-access paleogeographic and paleobiology data with plate tectonics. Biogeosciences, 10, 1529–1541. https://doi.org/10.5194/bg-10-1529-2013.
ARtracks - a Global Atmospheric River Catalogue Based on ERA5 and IPART
zenodo.org
explore.openaire.eu
+1more
zip
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominik Traxl; Dominik Traxl (2024). ARtracks - a Global Atmospheric River Catalogue Based on ERA5 and IPART [Dataset]. http://doi.org/10.5281/zenodo.7018725
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7018725
Dataset updated
May 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dominik Traxl; Dominik Traxl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The ARtracks Atmospheric River Catalogue is based on the ERA5 climate reanalysis dataset, specifically the output parameters "vertical integral of east-/northward water vapour flux". Most of the processing relies on
IPART (Image-Processing based Atmospheric River (AR) Tracking, https://github.com/ihesp/IPART), a Python package for automated AR detection, axis finding and AR tracking. The catalogue is provided as a pickled pandas.DataFrame as well as a CSV file.

For detailed information, please see https://github.com/dominiktraxl/artracks.

The ARtracks catalogue covers the years from 1979 to the end of the year 2019.
Z
Analysis of references in the IPCC AR6 WG2 Report of 2022
data.niaid.nih.gov
Updated Mar 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bianca Kramer (2022). Analysis of references in the IPCC AR6 WG2 Report of 2022 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6327206
Explore at:
Dataset updated
Mar 11, 2022
Dataset provided by
Cameron Neylon
Bianca Kramer
License
https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/
Description
This repository contains data on 17,419 DOIs cited in the IPCC Working Group 2 contribution to the Sixth Assessment Report, and the code to link them to the dataset built at the Curtin Open Knowledge Initiative (COKI).

References were extracted from the report's PDFs (downloaded 2022-03-01) via Scholarcy and exported as RIS and BibTeX files. DOI strings were identified from RIS files by pattern matching and saved as CSV file. The list of DOIs for each chapter and cross chapter paper was processed using a custom Python script to generate a pandas DataFrame which was saved as CSV file and uploaded to Google Big Query.

We used the main object table of the Academic Observatory, which combines information from Crossref, Unpaywall, Microsoft Academic, Open Citations, the Research Organization Registry and Geonames to enrich the DOIs with bibliographic information, affiliations, and open access status. A custom query was used to join and format the data and the resulting table was visualised in a Google DataStudio dashboard.

This version of the repository also includes the set of DOIs from references in the IPCC Working Group 1 contribution to the Sixth Assessment Report as extracted by Alexis-Michel Mugabushaka and shared on Zenodo: https://doi.org/10.5281/zenodo.5475442 (CC-BY)

A brief descriptive analysis was provided as a blogpost on the COKI website.

The repository contains the following content:

Data:

data/scholarcy/RIS/ - extracted references as RIS files

data/scholarcy/BibTeX/ - extracted references as BibTeX files

IPCC_AR6_WGII_dois.csv - list of DOIs

data/10.5281_zenodo.5475442/ - references from IPCC AR6 WG1 report

Processing:

preprocessing.R - preprocessing steps for identifying and cleaning DOIs

process.py - Python script for transforming data and linking to COKI data through Google Big Query

Outcomes:

Dataset on BigQuery - requires a google account for access and bigquery account for querying

Data Studio Dashboard - interactive analysis of the generated data

Zotero library of references extracted via Scholarcy

PDF version of blogpost

Note on licenses: Data are made available under CC0 (with the exception of WG1 reference data, which have been shared under CC-BY 4.0) Code is made available under Apache License 2.0

sustainable-fashion

kaggle.com

Updated Jan 8, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Tiyab K. (2025). sustainable-fashion [Dataset]. https://www.kaggle.com/datasets/tiyabk/sustainable-fashion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 8, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Tiyab K.

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Sustainable Fashion Q&A Dataset

This dataset contains a collection of synthetically generated Question-Answer (Q&A) pairs on sustainable fashion and style, with an emphasis on timeless wardrobe pieces, sustainable choices, and capsule wardrobe principles. The data was created using a large language model with advanced reasoning, prompted with various grounded contexts and real-world examples. It can be used to train or evaluate models that specialize in sustainable fashion advice, styling recommendations, or instruction-following tasks.

Overview

Context: The data focuses on classic, long-lasting wardrobe recommendations. Topics include choosing neutral color palettes, selecting high-quality fabrics (like wool), finding universally flattering silhouettes, and embracing sustainability in fashion choices...
Structure: Each entry is formatted, containing two primary fields:
- instruction – The user’s question or prompt
- response – The corresponding answer or advice
Example Entry (Truncated for Clarity): csv instruction,response "What makes a neutral color palette so timeless?", "Neutral tones like black, navy, beige, and gray offer unmatched versatility..."

Data Generation

Synthetic Creation:
This dataset is synthetic—the questions and answers were generated by a large language model. The prompts used in creation were seeded with diverse real-world fashion contexts and examples to ensure groundedness and practical relevance.
Advanced Reasoning:
The large language model was employed to simulate more detailed and nuanced fashion advice, making each Q&A pair comprehensive yet concise. Despite the synthetic nature, the reasoning incorporates established fashion principles and best practices.

Dataset Contents

Column Name	Description
instruction	A concise question related to fashion, style tips, capsule wardrobes, or sustainability.
response	A short, detailed answer offering timeless styling advice, illustrating best practices in fashion.

Potential Use Cases

Sustainable Fashion Chatbot/Assistant:
- Train a model to provide on-demand styling advice or recommendations for various occasions.
Instruction-Following/QA Models:
- Ideal for fine-tuning large language models (LLMs) so they can handle fashion-specific questions accurately.
Content Generation:
- Generate blog articles, social media content, or editorial pieces on sustainable and timeless fashion, using the Q&A patterns as seed material.
Sustainable Fashion Product Descriptions:
- Leverage the dataset to help a model create consistent, on-brand descriptions for apparel and accessories.

Getting Started

Download the Dataset
- The data is provided as a csv file where each line is a single record with the keys instruction and response.
Data Preprocessing
- Many Q&A or instruction-based fine-tuning frameworks allow direct ingestion of CSV files.
- Alternatively, convert the data into your preferred format ( Pandas DataFrame, etc.) for custom processing.
Sample Use
```python import csv

Load the data

data = [] with open('sustainable_fashion.csv', 'r', encoding='utf-8') as f: reader = csv.DictReader(f) for row in reader: data.append(row)

Example: Print the first Q&A

print("Question:", data[0]['instruction']) print("Answer:", data[0]['response']) ```

Model Fine-Tuning
- If using a language model (e.g., Gemma-style), you can structure each entry with a prompt and desired response.
- Incorporate additional context like a system message:
  You are a fashion advisor. Provide concise, accurate style guidance.

Tips for Best Results

Maintain Consistency:
- When fine-tuning, keep the format of instruction and response consistent. Models often learn better with clearly defined roles.
Supplementary Data:
- If your application requires broader knowledge (e.g., fashion trends or brand-specific info), consider augmenting this dataset with additional Q&A examples or general fashion text data.
Evaluate Quality:
- Periodically check the model’s responses using domain experts or user feedback. Adjust or expand the dataset if you notice gaps in the model’s understanding.
Ethical and Inclusive Language:
- Fashion advice can intersect with body image and cultural preferences. Ensure your final application provides inclusive and considerate guidance.

A
‘Austin's data portal activity metrics’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Austin's data portal activity metrics’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-austin-s-data-portal-activity-metrics-1ce3/1b069fcb/?iid=059-557&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Austin
Description
Analysis of ‘Austin's data portal activity metrics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/data-portal-activity-metricse on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Austin's open data portal provides lots of public data about the City of Austin. It also provides portal administrators with behind-the-scenes information about how the portal is used... but that data is mysterious, hard to handle in a spreadsheet, and not located all in one place.

Until now! Authorized city staff used admin credentials to grab this usage data and share it the public. The City of Austin wants to use this data to inform the development of its open data initiative and manage the open data portal more effectively.

This project contains related datasets for anyone to explore. These include site-level metrics, dataset-level metrics, and department information for context. A detailed detailed description of how the files were prepared (along with code) can be found on github here.

Example questions to answer about the data portal

What parts of the open data portal do people seem to value most?

What can we tell about who our users are?

How are our data publishers doing?

How much data is published programmatically vs manually?

How data is super fresh? Super stale?

Whatever you think we should know...

About the files

all_views_20161003.csv

There is a resource available to portal administrators called "Dataset of datasets". This is the export of that resource, and it was captured on Oct 3, 2016. It contains a summary of the assets available on the data portal. While this file contains over 1400 resources (such as views, charts, and binary files), only 363 are actual tabular datasets.

table_metrics_ytd.csv

This file contains information about the 363 tabular datasets on the portal. Activity metrics for an individual dataset can be accessed by calling Socrata's views/metrics API and passing along the dataset's unique ID, a time frame, and admin credentials. The process of obtaining the 363 identifiers, calling the API, and staging the information can be reviewed in the python notebook here.

site_metrics.csv

This file is the export of site-level stats that Socrata generates using a given time frame and grouping preference. This file contains records about site usage each month from Nov 2011 through Sept 2016. By the way, it contains 285 columns... and we don't know what many of them mean. But we are determined to find out!! For a preliminary exploration of the columns and what portal-related business processes to which they might relate, check out the notes in this python notebook here

city_departments_in_current_budget.csv

This file contains a list of all City of Austin departments according to how they're identified in the most recently approved budget documents. Could be helpful for getting to know more about who the publishers are.

crosswalk_to_budget_dept.csv

The City is in the process of standardizing how departments identify themselves on the data portal. In the meantime, here's a crosswalk from the department values observed in all_views_20161003.csv to the department names that appear in the City's budget

This dataset was created by Hailey Pate and contains around 100 samples along with Di Sync Success, Browser Firefox 19, technical information and other features such as: - Browser Firefox 33 - Di Sync Failed - and more.

How to use this dataset

Analyze Sf Query Error User in relation to Js Page View Admin

Study the influence of Browser Firefox 37 on Datasets Created

More datasets

Acknowledgements

If you use this dataset in your research, please credit Hailey Pate

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
f
Dataset supporting Figure 3 in "Pan-cortical 2-photon mesoscopic imaging and...
plus.figshare.com
bin
Updated Feb 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan Vickers; David A. McCormick (2024). Dataset supporting Figure 3 in "Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice" [Dataset]. http://doi.org/10.25452/figshare.plus.25114463.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25452/figshare.plus.25114463.v1
Dataset updated
Feb 1, 2024
Dataset provided by
Figshare+
Authors
Evan Vickers; David A. McCormick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains data from a single 1-photon widefield imaging experiment and a single Thorlabs mesoscope 2-photon imaging session from the same side mount mouse, corresponding to panels a-c and d of Figure 3, respectively. Included files contain imaging data, behavioral data, and python files with combined neurobehavioral data.Note that session names have the following format: "mouse#_bigEndianDate_cage#_info_session#_attempt#".Raw mesoscope imaging data is included in ScanImage rendered format as single big tiffs with the following nomenclature: "filename_2D.tiff".Mouse face and body cam images are included as standalone or concatenated .avi movie files, and behavioral data is included both as Spike2 files (smrx) and in exported form at Matlab data files (.mat).In all cases the first frame of the 2-photon movie, the right face/body movie, and Spike2 data are aligned to the first Labview-issued frameclock trigger (also recorded in Spike2, along with all other frameclock events). 2-photon triggers were sometimes incorrectly recorded in Spike2 (generally we recorded these as both events and waveforms), but were in all cases additionally exported from ScanImage tiff metadeta as timestamps (csv files ending in header.csv). Session start-time timestamps, also exported from ScanImage tiff metadata, appear as .txt files ending in "_starttime.txt".Preprocessed data (python) can be found in npy files with various names, each containing different subsets of variables relevant to the analysis. For each session, the npy file containing the string "standard_frames" contains the most complete, final stage set of preprocessed neurobehavioral data (in combined DataFrame format, exportable to nwb), including CCF/MMM alignments. The file containing the string "nb_dump" contains a large set of auxilliary variables that may be needed for additional preprocessing.Additional image files (tiff, png) and excel worksheets (xlsx, csv) containing high-level data summaries and records of intermediate analysis steps are also included.Please contact the authors for any additional clarifications as needed.See related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.7052513
f
Dataset supporting Figure S6_E235 in "Pan-cortical 2-photon mesoscopic...
plus.figshare.com
bin
Updated Feb 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan Vickers; David A. McCormick (2024). Dataset supporting Figure S6_E235 in "Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice" [Dataset]. http://doi.org/10.25452/figshare.plus.25115220.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25452/figshare.plus.25115220.v1
Dataset updated
Feb 1, 2024
Dataset provided by
Figshare+
Authors
Evan Vickers; David A. McCormick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains data from a single 1-photon widefield imaging experiment and a single Thorlabs mesoscope 2-photon imaging session from the same side mount mouse as in Figures 5 and 6 (but a different session than in those Figures), corresponding to panels d-f of Figure S6. Included files contain imaging data, behavioral data, and python files with combined neurobehavioral data. Additional python files corresponding to this session can be found in the FigShare+ folder in this collection corresponding to Figure 6 (this session was used in the BSOiD model training for the session that was fit in Figure 6).Note that session names have the following format: "mouse#_bigEndianDate_cage#_info_session#_attempt#".Raw mesoscope imaging data is included in ScanImage rendered format as single big tiffs with the following nomenclature: "filename_2D.tiff".Mouse face and body cam images are included as standalone or concatenated .avi movie files, and behavioral data is included both as Spike2 files (smrx) and in exported form at Matlab data files (.mat).In all cases the first frame of the 2-photon movie, the right face/body movie, and Spike2 data are aligned to the first Labview-issued frameclock trigger (also recorded in Spike2, along with all other frameclock events). 2-photon triggers were sometimes incorrectly recorded in Spike2 (generally we recorded these as both events and waveforms), but were in all cases additionally exported from ScanImage tiff metadeta as timestamps (csv files ending in header.csv). Session start-time timestamps, also exported from ScanImage tiff metadata, appear as .txt files ending in "_starttime.txt".Preprocessed data (python) can be found in npy files with various names, each containing different subsets of variables relevant to the analysis. For each session, the npy file containing the string "standard_frames" contains the most complete, final stage set of preprocessed neurobehavioral data (in combined DataFrame format, exportable to nwb), including CCF/MMM alignments. The file containing the string "nb_dump" contains a large set of auxilliary variables that may be needed for additional preprocessing.Additional image files (tiff, png) and excel worksheets (xlsx, csv) containing high-level data summaries and records of intermediate analysis steps are also included.Please contact the authors for any additional clarifications as needed.See related materials in Collection at: https://doi.org/10.25452/figshare.plus.c.7052513
SELTO Dataset
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch (2023). SELTO Dataset [Dataset]. http://doi.org/10.5281/zenodo.7034899
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7034899
Dataset updated
May 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Benchmark Dataset for Deep Learning-based Methods for 3D Topology Optimization.

One can find a description of the provided dataset partitions in Section 3 of Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.

Every dataset container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and a corresponding binarized SIMP solution. Every file of the form {i}.csv contains all voxel-wise information about the sample i. Every file of the form {i}_info.csv file contains scalar parameters of the topology optimization problem, such as material parameters.

This dataset represents topology optimization problems and solutions on the bases of voxels. We define all spatially varying quantities via the voxels' centers -- rather than via the vertices or surfaces of the voxels.
In {i}.csv files, each row corresponds to one voxel in the design space. The columns correspond to ['x', 'y', 'z', 'design_space', 'dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density'].

x, y, z - These are three integer indices stating the index/location of the voxel within the voxel mesh.

design_space - This is one ternary variable indicating the type of material density constraint on the voxel within the TO problem formulation. "0" and "1" indicate a material density fixed at 0 or 1, respectively. "-1" indicates the absence of constraints.

dirichlet_x, dirichlet_y, dirichlet_z - These are three binary variables defining whether the voxel contains homogenous Dirichlet constraints in the respective axis direction.

force_x, force_y, force_z - These are three floating point variables giving the three spacial components of the forces applied to each voxel. All forces are body forces given in [N/m^3].

density - This is a binary variable stating whether the voxel carries material in the solution of the topology optimization problem.

Any of these files with the index i can be imported using pandas by executing:

import pandas as pd directory = ... file_path = f'{directory}/{i}.csv' column_names = ['x', 'y', 'z', 'design_space','dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density'] data = pd.read_csv(file_path, names=column_names)

From this pandas dataframe one can extract the torch tensors of forces F, Dirichlet conditions ω_Dirichlet, and design space information ω_design using the following functions:

import torch def get_shape_and_voxels(data): shape = data[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1 vox_x = data['x'].values vox_y = data['y'].values vox_z = data['z'].values voxels = [vox_x, vox_y, vox_z] return shape, voxels def get_forces_boundary_conditions_and_design_space(data, shape, voxels): F = torch.zeros(3, *shape, dtype=torch.float32) F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_x'].values, dtype=torch.float32) F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_y'].values, dtype=torch.float32) F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_z'].values, dtype=torch.float32) ω_Dirichlet = torch.zeros(3, *shape, dtype=torch.float32) ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_x'].values, dtype=torch.float32) ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_y'].values, dtype=torch.float32) ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_z'].values, dtype=torch.float32) ω_design = torch.zeros(1, *shape, dtype=int) ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['design_space'].values.astype(int)) return F, ω_Dirichlet, ω_design

The corresponding {i}_info.csv files only have one row with column labels ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z'].

E - Young's modulus [Pa]

ν - Poisson's ratio [-]

σ_ys - Yield stress [Pa]

vox_size - Length of the edge of a (cube-shaped) voxel [m]

p_x, p_y, p_z - Location of the root of the design space [m]

Analogously to above, one can import any {i}_info.csv file by executing:

file_path = f'{directory}/{i}_info.csv' data_info_column_names = ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z'] data_info = pd.read_csv(file_path, names=data_info_column_names)
u
Data from: Algoritmo de detección de odio en español (Algorithm for...
produccioncientifica.ucm.es
investigacion.unir.net
+1more
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel; Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel (2024). Algoritmo de detección de odio en español (Algorithm for detection of hate speech in Spanish) [Dataset]. https://produccioncientifica.ucm.es/documentos/67321c66aea56d4af0483395
Explore at:
Dataset updated
2024
Authors
Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel; Said-Hung, Elias; Montero-Diaz, julio; De Gregorio Vicente, Oscar; Ruiz-Iniesta, Almudena; Blanco Valencia, Xiomara; José Cubillas, Juan; Pérez Palau, Daniel
Description
Algoritmo de la detección de expresiones de odio en español. Este algoritmo fue desarrollado en el marco del proyecto Hatemedia (PID2020-114584GB-I00), financiado por MCIN/AEI /10.13039/501100011033, con la colaboración de Possible Inc.Algoritmo desarrollado en el marco del proyecto Hatemedia (PID2020-114584GB-I00), financiado por MCIN/ AEI /10.13039/501100011033La estructura de carpetas con la documentación de Github es la presentada a continuación:02 Documentación Github└── 00_Odio y no odio├── DOCUMENTACIÓN GITHUB.docx├── ejemplo (1).py├── Modelo_binario_ (1) (1).ipynb├── obtener_caracteristicas (1).py└── Recursos-20231027T110710Z-001 (1).zip
Se detalla a continuación el contenido de cada fichero:- DOCUMENTACIÓN GITHUB.docx:Informe en el que se presenta el uso de los scripts ejemplo (1).py y obtener_caracteristicas (1).py para emplear los modelos.- ejemplo (1).py: Script Python que muestra el uso de los modelos para realizar predicciones.Modelo_binario_(1) (1).ipnyb:Notebook con el código utilizado para el entrenamiento de los distintos modelos.Obtener_caracteristicas (1).py: Script Python con las funciones de preprocesado utilizadas previamente al uso de los modelos para predecir las entradas de un dataframe.Recursos-20231027T110710Z-001 (1).zip:La carpeta recursos contiene 3 .csv utilizados en la extracción de características.El dataset que se ha utilizado para el entrenamiento de los modelos es dataset_completo_caracteristicas_ampliadas_todas_combinaciones_v1_textoProcesado.csv(https://acortar.link/diSV7o)El Algoritmo se desarrolló, a partir de las pruebas de modelos aplicados que se muestran a continuación:MODELOS├── 70-30│ ├── CART_binario_70-30.joblib│ ├── GB_binario_70-30.joblib│ ├── MLP_binario_70-30.joblib│ ├── NB_binario_70-30.joblib│ ├── RF_binario_70-30.joblib│ └── SVM_binario_70-30.joblib├── 80-20│ ├── CART_binario_80-20.joblib│ ├── GB_binario_80-20.joblib│ ├── MLP_binario_80-20.joblib│ ├── NB_binario_80-20.joblib│ ├── RF_binario_80-20.joblib│ └── SVM_binario_80-20.joblib└── 90-10├── CART_binario_90-10.joblib├── GB_binario_90-10.joblib├── MLP_binario_90-10.joblib├── NB_binario_90-10.joblib├── RF_binario_90-10.joblib└── SVM_binario_90-10.joblibEn las carpetas 70-30, 80-20 y 90-10 podemos encontrar los distintos modelos ya entrenados con los respectivos porcentajes de train y test.Se comparte resultados y comparativas generados durante el proceso de entrenamiento y validación de modelo final usado para el desarrollo del algoritmo, la carpeta MODELOS (subido en Github), y en documento Comparativa_V2.xlsx (subido en github).El procedimiento seguido para realizar el entrenamiento de los modelos queda reflejado en el Informe técnico desarrollo de algoritmo de clasificación de odio/no odio en medios informativos digitales españoles en X (Twitter), Facebook y portales web (https://doi.org/10.6084/m9.figshare.26085688.v1).Autores:Elias Said-HungJulio Montero-DíazOscar De Gregorio- Almudena RuizXiomara BlancoJuan José CubillasDaniel Pérez PalauFinanciado por:Agencia Estatal de Investigación – Ministerio de Ciencia e InnovaciónCon el apoyo de:- POSSIBLE S.L.Como citar: Said-Hung, E., Montero-Diaz, J., De Gregorio Vicente, O., Ruiz-Iniesta, A., Blanco Valencia, X., José Cubillas, J., and Pérez Palau, D. (2023), “Algorithm for classifying hate expressions in Spanish”, figshare. https://doi.org/10.6084/m9.figshare.24574906.Más información:- https://www.hatemedia.es/ o contactar con: elias.said@unir.net----Algorithm for detection of hate expressions in Spanish. This algorithm was developed within the framework of the Hatemedia project (PID2020-114584GB-I00), funded by MCIN/ AEI /10.13039/501100011033, with the collaboration of Possible Inc.Algorithm developed within the framework of the Hatemedia project (PID2020-114584GB-I00), funded by MCIN/ AEI /10.13039/501100011033The folder structure with the GitHub documentation is presented below:02 Documentación Github└── 00_Odio y no odio├── DOCUMENTACIÓN GITHUB.docx├── ejemplo (1).py├── Modelo_binario_ (1) (1).ipynb├── obtener_caracteristicas (1).py└── Recursos-20231027T110710Z-001 (1).zipThe content of each file is detailed below:DOCUMENTACIÓN GITHUB.docx: Report that presents the example of the script (1).py and get_characteristics (1).py to use the models.- ejemplo (1).py: Python script showing the use of models to make predictions.Modelo_binario_(1) (1).ipnyb: This is a notebook with the code used to train the different models.Obtener_caracteristicas (1).py: Python script with the preprocessing functions used before using the models to predict the inputs of a data frame.Recursos-20231027T110710Z-001 (1).zip: The resources folder contains 3 .csv used in feature extraction.The dataset that has been used for training the models is dataset_completo_caracteristicas_ampliadas_todos_combinaciones_v1_textoProcesado.csv (https://acortar.link/diSV7o)The Algorithm was developed from the tests of applied models shown below:MODELS├── 70-30│ ├── CART_binario_70-30.joblib│ ├── GB_binario_70-30.joblib│ ├── MLP_binario_70-30.joblib│ ├── NB_binario_70-30.joblib│ ├── RF_binario_70-30.joblib│ └── SVM_binario_70-30.joblib├── 80-20│ ├── CART_binario_80-20.joblib│ ├── GB_binario_80-20.joblib│ ├── MLP_binario_80-20.joblib│ ├── NB_binario_80-20.joblib│ ├── RF_binario_80-20.joblib│ └── SVM_binario_80-20.joblib└── 90-10├── CART_binario_90-10.joblib├── GB_binario_90-10.joblib├── MLP_binario_90-10.joblib├── NB_binario_90-10.joblib├── RF_binario_90-10.joblib└── SVM_binario_90-10.joblibIn folders 70-30, 80-20 and 90-10, we can find the different models already trained with the respective percentages of train and test.Results and comparisons generated during the training and validation process of the final model used for the algorithm's development are shared in the MODELS folder (uploaded on Github) and in the document Comparativa_V2.xlsx (uploaded on GitHub).The procedure for training the models is reflected in the Technical report development of hate/non-hate classification algorithm in Spanish digital news media on X (Twitter), Facebook and web portals (https://doi.org/10.6084/m9.figshare.26085688.v1).The dataset used for training is dataset_completo_caracteristicas_ampliadas_todas_combinaciones_v1_textoProcesado.csv (https://acortar.link/diSV7o)As documentation, in the folder "02 Documentación Github/00_Odio y no odio", the report "DOCUMENTACIÓN GITHUB.docx" explains the use of the different training models for making predictions.Authors:Elias Said-HungJulio Montero-DíazOscar De GregorioAlmudena Ruiz- Xiomara BlancoJuan José CubillasDaniel Pérez PalauFunded by: State Research Agency – Ministry of Science and InnovationWith the support of:- POSSIBLE S.L.How to cites: Said-Hung, E., Montero-Diaz, J., De Gregorio Vicente, O., Ruiz-Iniesta, A., Blanco Valencia, X., José Cubillas, J., and Pérez Palau, D. (2023), “Algorithm for classifying hate expressions in Spanish”, figshare. https://doi.org/10.6084/m9.figshare.24574906.More information:- https://www.hatemedia.es/ or contact: elias.said@unir.net

3D skeletons UP-Fall Dataset

zenodo.org

zip

Updated Jul 20, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Tresor KOFFI; Tresor KOFFI (2024). 3D skeletons UP-Fall Dataset [Dataset]. http://doi.org/10.5281/zenodo.12773013

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.12773013

Dataset updated

Jul 20, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Tresor KOFFI; Tresor KOFFI

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jul 18, 2024

Description

3D skeletons UP-Fall Dataset

Different between Fall and Impact detection

Overview

This dataset aims to facilitate research in fall detection, particularly focusing on the precise detection of impact moments within fall events. The 3D skeletons data accuracy and comprehensiveness make it a valuable resource for developing and benchmarking fall detection algorithms. The dataset contains 3D skeletal data extracted from fall events and daily activities of 5 subjects performing fall scenarios

Data Collection

The skeletal data was extracted using a pose estimation algorithm, which processes images frames to determine the 3D coordinates of each joint. Sequences with less than 100 frames of extracted data were excluded to ensure the quality and reliability of the dataset. As a result, some subjects may have fewer CSV files.

CSV Structure

The data is organized by subjects, and each subject contains CSV files named according to the pattern C1S1A1T1, where:

C: Camera (1 or 2)
S: Subject (1 to 5)
A: Activity (1 to N, representing different activities)
T: Trial (1 to 3)

subject1/`: Contains CSV files for Subject 1.

C1S1A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 1
C1S1A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 1
C1S1A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 1
C2S1A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 1
C2S1A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 1
C2S1A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 1

subject2/`: Contains CSV files for Subject 2.

C1S2A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 2
C1S2A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 2
C1S2A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 2
C2S2A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 2
C2S2A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 2
C2S2A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 2

subject3/, subject4/, subject5/: Similar structure as above, but may contain fewer CSV files due to the data extraction criteria mentioned above.

Column Descriptions

Each CSV file contains the following columns representing different skeletal joints and their respective coordinates in 3D space:

Column Name	Description
joint_1_x	X coordinate of joint 1
joint_1_y	Y coordinate of joint 1
joint_1_z	Z coordinate of joint 1
joint_2_x	X coordinate of joint 2
joint_2_y	Y coordinate of joint 2
joint_2_z	Z coordinate of joint 2
...	...
joint_n_x	X coordinate of joint n
joint_n_y	Y coordinate of joint n
joint_n_z	Z coordinate of joint n
LABEL	Label indicating impact (1) or non-impact (0)

Example

Here is an example of what a row in one of the CSV files might look like:

joint_1_x	joint_1_y	joint_1_z	joint_2_x	joint_2_y	joint_2_z	...	joint_n_x	joint_n_y	joint_n_33	LABEL
0.123	0.456	0.789	0.234	0.567	0.890	...	0.345	0.678	0.901	0

Usage

This data can be used for developing and benchmarking impact fall detection algorithms. It provides detailed information on human posture and movement during falls, making it suitable for machine learning and deep learning applications in impact fall detection and prevention.

Using github

1. Clone the repository:

-bash
git clone

https://github.com/Tresor-Koffi/3D_skeletons-UP-Fall-Dataset

2. Navigate to the directory:

-bash
-cd 3D_skeletons-UP-Fall-Dataset

Examples

Here's a simple example of how to load and inspect a sample data file using Python:
```python
import pandas as pd

# Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

data = pd.read_csv('subject1/C1S1A1T1.csv')
print(data.head())

e
Dataset with four years of condition monitoring technical language...
data.europa.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luleå Tekniska universitet, Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-5878-hafd-ms27~~1?locale=bg
Explore at:
unknownAvailable download formats
Dataset authored and provided by
Luleå Tekniska universitet
Area covered
Sverige
Description
Detta dataset består av tekniskt-språk-annoteringar från fyra års insamling från två pappersmaskiner i norra Sverige, strukturerat som en Pandas dataframe. Samma data finns också tillgänglig som en semikolonseparerad .csv-fil. Datan består av två kolumner, där den första kolumnen motsvarar annoteringens textinnehåll, och den andra titeln. Annoteringarna är skrivna på svenska, och processade så att alla egennamn ersatts av textsträngen ’egennamn’. Varje rad motsvarar en annotering med titel.

Data behandlas i Python med: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']
1117 Russian cities with city name, region, geographic coordinates and 2020...
zenodo.org
explore.openaire.eu
csv
Updated Aug 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov (2021). 1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate [Dataset]. http://doi.org/10.5281/zenodo.5151423
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5151423
Dataset updated
Aug 6, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Russia
Description
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.

How to use

from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))

Files:

towns.csv - city information

regions.csv - list of Russian Federation regions

alt_city_names.json - alternative city names

Сolumns (towns.csv):

Basic info:

city - city name (several cities have alternative names marked in alt_city_names.json)

population - city population, thousand people, Rosstat estimate as of 1.1.2020

lat,lon - city geographic coordinates

Region:

region_name - subnational region (oblast, republic, krai or AO)

region_iso_code - ISO 3166 code, eg RU-VLD

federal_district, eg Центральный

City codes:

okato

oktmo

fias_id

kladr_id

Data sources

City list and city population collected from Rosstat publication Регионы России. Основные социально-экономические показатели городов and parsed from publication Microsoft Word files.

City list corresponds to this Wikipedia article.

Alternative dataset is wiki-based Dadata city dataset (no population data).

Comments

City groups

Ханты-Мансийский and Ямало-Ненецкий autonomous regions excluded to avoid duplication as parts of Тюменская область.

Several notable towns are classified as administrative part of larger cities (Сестрорецк is a municpality at Saint-Petersburg, Щербинка part of Moscow). They are not and not reported in this dataset.

By individual city

Белоозерский not found in Rosstat publication, but should be considered a city as of 1.1.2020

Alternative city names

We suppressed letter "ё" city columns in towns.csv - we have Орел, but not Орёл. This affected:

Белоозёрский

Королёв

Ликино-Дулёво

Озёры

Щёлково

Орёл

Дмитриев and Дмитриев-Льговский are the same city.

assets/alt_city_names.json contains these names.

Tests

poetry install poetry run python -m pytest

How to replicate dataset

1. Base dataset

Run:

download data stro rar/get.sh

convert Саратовская область.doc to docx

run make.py

Creates:

_towns.csv

assets/regions.csv

2. API calls

Note: do not attempt if you do not have to - this runs a while and loads third-party API access.

You have the resulting files in repo, so probably does not need to these scripts.

Run:

cd geocoding

run coord_dadata.py (needs token)

run coord_osm.py

Creates:

coord_dadata.csv

coord_osm.csv

3. Merge data

Run:

run merge.py

Creates:

assets/towns.csv
f
Additional file 6: of Gene overlapping and size constraints in the viral...
springernature.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nadav Brandes; Michal Linial (2023). Additional file 6: of Gene overlapping and size constraints in the viral world [Dataset]. http://doi.org/10.6084/m9.figshare.c.3631700_D1.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3631700_D1.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Nadav Brandes; Michal Linial
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
VIPERdb clean data. Contains structural data about the capsids of icosahedral viral genera, as taken from VIPERdb after merging together records of the same genus (see Methods). Rename this file to â€œviperdb_clean.csvâ€ in order to load it through our Python framework. (CSV 6 kb)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6832242

Dataset updated

Oct 20, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{
  _id:

Clear search

Close search

Google apps

Main menu

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

Spectra of Earth-like Planets

load the wavelength dataframe

load the fluxes dataframe

CERNatschool-frame-reader Test Dataset - CSV format

Data from: A Phanerozoic gridded dataset for palaeogeographic...

ARtracks - a Global Atmospheric River Catalogue Based on ERA5 and IPART

Analysis of references in the IPCC AR6 WG2 Report of 2022

sustainable-fashion

Sustainable Fashion Q&A Dataset

Overview

Data Generation

Dataset Contents

Potential Use Cases

Getting Started

Load the data

Example: Print the first Q&A

Tips for Best Results

‘Austin's data portal activity metrics’ analyzed by Analyst-2

About this dataset

Background

Example questions to answer about the data portal

About the files

all_views_20161003.csv

table_metrics_ytd.csv

site_metrics.csv

city_departments_in_current_budget.csv

crosswalk_to_budget_dept.csv

How to use this dataset

Acknowledgements

Start A New Notebook!

Dataset supporting Figure 3 in "Pan-cortical 2-photon mesoscopic imaging and...

Dataset supporting Figure S6_E235 in "Pan-cortical 2-photon mesoscopic...

SELTO Dataset

Data from: Algoritmo de detección de odio en español (Algorithm for...

3D skeletons UP-Fall Dataset

Examples

Dataset with four years of condition monitoring technical language...

1117 Russian cities with city name, region, geographic coordinates and 2020...

Additional file 6: of Gene overlapping and size constraints in the viral...

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wildSee More Versions

`all_views_20161003.csv`

`table_metrics_ytd.csv`

`site_metrics.csv`

`city_departments_in_current_budget.csv`

`crosswalk_to_budget_dept.csv`

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild