100+ datasets found

R
Dior R Dataset
universe.roboflow.com
zip
Updated Dec 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
new-workspace-zecmk (2024). Dior R Dataset [Dataset]. https://universe.roboflow.com/new-workspace-zecmk/dior-r-jbpln/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Dec 25, 2024
Dataset authored and provided by
new-workspace-zecmk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
DIOR R Bounding Boxes
Description
DIOR R

## Overview DIOR R is a dataset for object detection tasks - it contains DIOR R annotations for 23,419 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Stirred Reactor CFD ML-Dataset
kaggle.com
zip
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SimLab | AILab | SciSoftLab (2023). Stirred Reactor CFD ML-Dataset [Dataset]. https://www.kaggle.com/datasets/novalabs/stirred-reactor-cfd-ml-dataset
Explore at:
zip(34666 bytes)Available download formats
Dataset updated
Dec 18, 2023
Authors
SimLab | AILab | SciSoftLab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset consists of reactor features like reactor diameter, liquid height, number of immersed stirrers, number of blades, pitch angle, rpm etc. The response variables are the median values of turbulent kinetic energy (k), turbulent dissipation rate (epsilon), strain rate (strainRate) and velocity magnitute (Umag).

The values were extracted from CFD simulations that were setup and run online with CliqScale.R, a stirred reactor online tool based on OpenFOAM (simpleFoam) and is developed and hosted by Novalabs AG in Zürich, Switzerland: https://novalabs.ch/cliqscaler/. CliqScale.R allows 3 mesh resolutions coarse, medium and fine. The current dataset has been build with coarse mesh. For a dataset with higher resolution please contact us through support.csr@novalabs.ch.

Dataset configuration: - Volume range: 1L – 7000L - 3 volumes per reactor - 4 rpm per volume - 3 number of blades [2,3,4] - 2 pitch angles [45, 60] - 1 baffle configuration [4 baffles] - Fluid: water @ ~ 20°C

Click "view more" for reactor image.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10105022%2F2c0fa3d909aacb2dbdd879276b46ce50%2Freactor.png?generation=1703064710518069&alt=media" alt="">

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

zenodo.org

application/gzip, bin +2

Updated Aug 2, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788

Explore at:

bin, application/gzip, zip, text/x-pythonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1419788

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

d
Replication Data for: Revisiting 'The Rise and Decline' in a Population of...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SG3LP1
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
Description
This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.
d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
bb_to_run_c
kaggle.com
zip
Updated Oct 3, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
rpsantosa_kaggle (2020). bb_to_run_c [Dataset]. https://www.kaggle.com/rpsantosakaggle/bb-to-run-c
Explore at:
zip(2261265 bytes)Available download formats
Dataset updated
Oct 3, 2020
Authors
rpsantosa_kaggle
Description
Dataset

This dataset was created by rpsantosa_kaggle

Contents
f
R dataset for processing 16S rRNA sequences
datasetcatalog.nlm.nih.gov
figshare.com
Updated Oct 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renaudin, Marie (2022). R dataset for processing 16S rRNA sequences [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000259294
Explore at:
Dataset updated
Oct 17, 2022
Authors
Renaudin, Marie
Description
R dataset used in the manuscript entitled "Unraveling global and diazotrophic bacteriomes of boreal forest floor feather mosses and their environmental drivers at the ecosystem and at the plant scale in North America".See file "R code for processing 16S rRNA sequences" to download the necessary R code.
e
Simple download service (Atom) of the dataset: Aerodrome Exposure Plan...
data.europa.eu
unknown
Updated Mar 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Simple download service (Atom) of the dataset: Aerodrome Exposure Plan (Noise Curves) of Vienna [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-c83e8d25-8470-42b4-bb66-f620b67496be?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
Mar 1, 2022
Description
The decision to establish an EFP shall be taken by the Prefect. The approved EFP is then annexed to the local planning plan. The EPB may be revised at the request of the Prefect or on a proposal from the ECA. The aerodromes to be equipped with an EFP shall be those classified in categories A, B and C. These are also aerodromes listed on a list drawn up by orders of the Ministers responsible for defence, urban planning, civil aviation and the environment. Of the 600 aerodromes in France, 190 of them have a BDP. Legal texts: —Articles L147-1 to 8 and R 147-1 to 11 of the Urban Planning Code. —Articles L571-13, R 571-58 to 65 and R 571-70 to 80 of the Environmental Code. —Articles L123-1 to 16 and R 123-6 to 23 of the Environmental Code. — Decree of 28 March 1988 laying down the list of aerodromes not classified in category A, B or C to be equipped with an EDP as amended by the Orders of 17/01/97, 04/09/03 and 27/05/05. —Articles L227-1 to 10 of the Civil Aviation Code. In the department of Vienna, only the Poitiers-Biard airfield has an EPB (Prefectural Decree n°2007-D2B3-194 of 02/07/2007)
F
i.c.sens Visual-Inertial-LiDAR Dataset
data.uni-hannover.de
bag, jpeg, pdf, png +2
Updated Dec 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
i.c.sens (2024). i.c.sens Visual-Inertial-LiDAR Dataset [Dataset]. https://data.uni-hannover.de/dataset/i-c-sens-visual-inertial-lidar-dataset
Explore at:
bag, jpeg, txt, pdf, png, rvizAvailable download formats
Dataset updated
Dec 12, 2024
Dataset authored and provided by
i.c.sens
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
The i.c.sens Visual-Inertial-LiDAR Dataset is a data set for the evaluation of dead reckoning or SLAM approaches in the context of mobile robotics. It consists of street-level monocular RGB camera images, a front-facing 180° point cloud, angular velocities, accelerations and an accurate ground truth trajectory. In total, we provide around 77 GB of data resulting from a 15 minutes drive, which is split into 8 rosbags of 2 minutes (10 GB) each. Besides, the intrinsic camera parameters and the extrinsic transformations between all sensor coordinate systems are given. Details on the data and its usage can be found in the provided documentation file.

https://data.uni-hannover.de/dataset/0bcea595-0786-44f6-a9e2-c26a779a004b/resource/0ff90ef9-fa61-4ee3-b69e-eb6461abc57b/download/sensor_platform_small.jpg" alt="">

Image credit: Sören Vogel

The data set was acquired in the context of the measurement campaign described in Schoen2018. Here, a vehicle, which can be seen below, was equipped with a self-developed sensor platform and a commercially available Riegl VMX-250 Mobile Mapping System. This Mobile Mapping System consists of two laser scanners, a camera system and a localization unit containing a highly accurate GNSS/IMU system.

https://data.uni-hannover.de/dataset/0bcea595-0786-44f6-a9e2-c26a779a004b/resource/2a1226b8-8821-4c46-b411-7d63491963ed/download/vehicle_small.jpg" alt="">

Image credit: Sören Vogel

The data acquisition took place in May 2019 during a sunny day in the Nordstadt of Hannover (coordinates: 52.388598, 9.716389). The route we took can be seen below. This route was completed three times in total, which amounts to a total driving time of 15 minutes.

https://data.uni-hannover.de/dataset/0bcea595-0786-44f6-a9e2-c26a779a004b/resource/8a570408-c392-4bd7-9c1e-26964f552d6c/download/google_earth_overview_small.png" alt="">

The self-developed sensor platform consists of several sensors. This dataset provides data from the following sensors:

Velodyne HDL-64 LiDAR

LORD MicroStrain 3DM-GQ4-45 GNSS aided IMU

Pointgrey GS3-U3-23S6C-C RGB camera

To inspect the data, first start a rosmaster and launch rviz using the provided configuration file:

roscore & rosrun rviz rviz -d icsens_data.rviz

Afterwards, start playing a rosbag with

rosbag play icsens-visual-inertial-lidar-dataset-{number}.bag --clock

Below we provide some exemplary images and their corresponding point clouds.

https://data.uni-hannover.de/dataset/0bcea595-0786-44f6-a9e2-c26a779a004b/resource/dc1563c0-9b5f-4c84-b432-711916cb204c/download/combined_examples_small.jpg" alt="">

Related publications:

R. Voges, C. S. Wieghardt, and B. Wagner, “Finding Timestamp Offsets for a Multi-Sensor System Using Sensor Observations,” Photogrammetric Engineering & Remote Sensing, vol. 84, no. 6, pp. 357–366, 2018.

R. Voges and B. Wagner, “RGB-Laser Odometry Under Interval Uncertainty for Guaranteed Localization,” in Book of Abstracts of the 11th Summer Workshop on Interval Methods (SWIM 2018), Rostock, Germany, Jul. 2018.

R. Voges and B. Wagner, “Timestamp Offset Calibration for an IMU-Camera System Under Interval Uncertainty,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, Oct. 2018.

R. Voges and B. Wagner, “Extrinsic Calibration Between a 3D Laser Scanner and a Camera Under Interval Uncertainty,” in Book of Abstracts of the 12th Summer Workshop on Interval Methods (SWIM 2019), Palaiseau, France, Jul. 2019.

R. Voges, B. Wagner, and V. Kreinovich, “Efficient Algorithms for Synchronizing Localization Sensors Under Interval Uncertainty,” Reliable Computing (Interval Computations), vol. 27, no. 1, pp. 1–11, 2020.

R. Voges, B. Wagner, and V. Kreinovich, “Odometry under Interval Uncertainty: Towards Optimal Algorithms, with Potential Application to Self-Driving Cars and Mobile Robots,” Reliable Computing (Interval Computations), vol. 27, no. 1, pp. 12–20, 2020.

R. Voges and B. Wagner, “Set-Membership Extrinsic Calibration of a 3D LiDAR and a Camera,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, Oct. 2020, accepted.

R. Voges, “Bounded-Error Visual-LiDAR Odometry on Mobile Robots Under Consideration of Spatiotemporal Uncertainties,” PhD thesis, Gottfried Wilhelm Leibniz Universität, 2020.
d
Health and Retirement Study (HRS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ELEKOY
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
d
Data release for solar-sensor angle analysis subset associated with the...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for solar-sensor angle analysis subset associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-solar-sensor-angle-analysis-subset-associated-with-the-journal-article-so
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Western United States, United States
Description
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
e
Dataset Direct Download Service (WFS): Hunting and Wildlife Area (Savoie)
data.europa.eu
wfs
Updated Mar 27, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Dataset Direct Download Service (WFS): Hunting and Wildlife Area (Savoie) [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-0e888c3f-76cc-4d31-b6df-35d3ba58947b
Explore at:
wfsAvailable download formats
Dataset updated
Mar 27, 2019
Description
The purpose of hunting and wildlife reserves is the protection of migratory birds and endangered species and the development of tools for the management of wildlife and their habitats. Hunting and wildlife reserves shall be established, in accordance with their interest, by the prefect (Article R 422-82 EC) or by the Minister responsible for hunting (national reserves Article R422-92 EC).The hunting and wildlife reserves established by prefectural decree shall be established at the request of the holder of the right to hunt and prohibit any hunting act. They play an important role in preserving wildlife and, in particular, species that can be hunted. National hunting and wildlife reserves are hunting and wildlife reserves of particular importance (Article R.422-92 of the Environmental Code):- either because of their extent;- or because they contain species whose numbers are declining in all or part of the national territory or species of remarkable qualities;- or based on scientific, technical or practical demonstrations pursued there.Legal References: — Article L. 422-27 (amended by the Law on the Development of Rural Territories of 23 February 2005) — Articles R. 422-82 to R. 422-94 of the Environmental Code — Decree of 23 September 1991 as amended.
S
R code for obtaining species occurrence data from open datasets
scidb.cn
Updated Feb 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanbao Du; Xuan Liu; Yusi Xin (2023). R code for obtaining species occurrence data from open datasets [Dataset]. http://doi.org/10.57760/sciencedb.07577
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.07577
Dataset updated
Feb 27, 2023
Dataset provided by
Science Data Bank
Authors
Yuanbao Du; Xuan Liu; Yusi Xin
License
https://api.github.com/licenses/cc0-1.0https://api.github.com/licenses/cc0-1.0
Description
This R code is used to download species occurrence data from open datasets such as Global Biodiversity Information Facility, Atlas of Living Australia, Biodiversity Information Serving Our Nation, iNaturalist, eBird, and Integrated Digitized Biocollections.
e
Dataset Direct Download Service (WFS): Easement plate A5 linked to public...
data.europa.eu
unknown
Updated Sep 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Dataset Direct Download Service (WFS): Easement plate A5 linked to public water and sewerage pipes — digitised version DDT 63 [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-49dabf4c-fe0f-4238-b59d-7b343ffa8b79
Explore at:
unknownAvailable download formats
Dataset updated
Sep 19, 2021
Description
Public utility easements (SUPs) are administrative limitations on the right to property, they are established for the benefit of public persons, concessionaires of public works or public works, private persons engaged in an activity in the public interest. The collection and conservation of public utility easements is a sovereign task of the State, which must bring them to the attention of local and regional authorities so that they may annex them to their urban planning documents. The public utility easements concerned are those defined by Articles L. 126-1 and R. 126-1 of the Urban Planning Code and their annexes.

Type A5 SUPs are established for the benefit of public authorities, public establishments or public service concessionaires which undertake works for the establishment of drinking water pipes or the disposal of wastewater or rainwater, a servitude conferring on them the right to permanently establish underground pipes on unbuilt private land, except the courtyards and gardens adjacent to the dwellings.

Texts in force: Articles L. 152-1, L. 152-2 and R.152-1 to R. 152-15 of the Rural and Maritime Fisheries Code
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
g
Simple download service (Atom) of the dataset: N BRUIT ZBR INFRA R A0006 A...
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simple download service (Atom) of the dataset: N BRUIT ZBR INFRA R A0006 A LD S 045 [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-cb74b32c-efee-40a3-8419-81ddb28fb6e7
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Noise level zones describe a noise exposure situation based on a noise indicator or area affected by noise. They are used primarily for the preparation of strategic noise maps pursuant to Article R.572-5 of the Environmental Code. The data in this table present the exposure characteristics of the day, evening and night populations.
Cleaned NHANES 1988-2018
figshare.com
txt
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21743372.v9
Dataset updated
Feb 18, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
R
Augmentasi R Dataset
universe.roboflow.com
zip
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universiatas Muslim Indonesia (2024). Augmentasi R Dataset [Dataset]. https://universe.roboflow.com/universiatas-muslim-indonesia/augmentasi-r/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 27, 2024
Dataset authored and provided by
Universiatas Muslim Indonesia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
R Bounding Boxes
Description
Augmentasi R

## Overview Augmentasi R is a dataset for object detection tasks - it contains R annotations for 299 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Personal Protective Equipment Dataset (PPED)
zenodo.org
data.niaid.nih.gov
bin, zip
Updated May 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2022). Personal Protective Equipment Dataset (PPED) [Dataset]. http://doi.org/10.5281/zenodo.6551758
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6551758
Dataset updated
May 17, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Personal Protective Equipment Dataset (PPED)

This dataset serves as a benchmark for PPE in chemical plants
We provide datasets and experimental results.

1. The dataset

We produced a data set based on the actual needs and relevant regulations in chemical plants.
The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories.
The complete dataset is contained in the folder PPED/data.

1.1. Image collection

We took more than 3300 pictures.
We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.

Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.

Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.

Light: Good lighting conditions and poor lighting conditions were studied.

Diversity: Some images contain a single person, and some contain multiple people.

Angle: The pictures we took can be divided into front and side.

A total of more than 3300 photos were taken in the raw data under all conditions.
All images are located in the folder “PPED/data/JPEGImages”.

1.2. Label

We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format.
Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file.
Annotations are stored in the folder PPED/data/Annotations

1.3. Dataset Features

The pictures are made by us according to the different conditions mentioned above.
The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.

1.4. Dataset Division

The data set is divided into 9:1 training set and test set.

2. Baseline Experiments

We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5.
All code and results is given in folder PPED/experiment.

2.1. Environment and Configuration:

Intel Core i7-8700 CPU

NVIDIA GTX1060 GPU

16 GB of RAM

Python: 3.8.10

pytorch: 1.9.0

pycocotools: pycocotools-win

Windows 10

2.2. Applied Models

The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.

2.2.1. Faster R-CNN

Faster R-CNN

backbone: resnet50+fpn

We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.

We modified the dataset path, training classes and training parameters including batch size.

We run train_res50_fpn.py start training.

Then, the weights are trained by the training set.

Finally, we validate the results on the test set.

backbone: mobilenetv2

the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.

The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN.
The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth.
The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).

2.2.2. SSD

backbone: resnet50

We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.

The same training method as Faster R-CNN is applied.

The SSD source code used in our experiment is given in folder PPED/experiment/ssd.
The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth.
The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.

2.2.3. YOLOv3-spp

backbone: DarkNet53

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov3-spp-ultralytics-608.pt.

The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp.
The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt.
The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.

2.2.4. YOLOv5

backbone: CSP_DarkNet

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov5s.

The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5.
The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt.
The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.

2.3. Evaluation

The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.

3. Code Sources

Faster R-CNN (R and M)

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

SSD

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

YOLOv3-spp

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

YOLOv5

https://github.com/ultralytics/yolov5

Facebook

Twitter

Click to copy link

Link copied

Cite

new-workspace-zecmk (2024). Dior R Dataset [Dataset]. https://universe.roboflow.com/new-workspace-zecmk/dior-r-jbpln/dataset/1

Dior R Dataset

dior-r-jbpln

dior-r-dataset

Explore at:

zipAvailable download formats

Dataset updated

Dec 25, 2024

Dataset authored and provided by

new-workspace-zecmk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

DIOR R Bounding Boxes

Description

DIOR R

## Overview

DIOR R is a dataset for object detection tasks - it contains DIOR R annotations for 23,419 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Dior R Dataset

DIOR R

Stirred Reactor CFD ML-Dataset

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

Current Population Survey (CPS)

bb_to_run_c

Dataset

Contents

R dataset for processing 16S rRNA sequences

Simple download service (Atom) of the dataset: Aerodrome Exposure Plan...

i.c.sens Visual-Inertial-LiDAR Dataset

Related publications:

Health and Retirement Study (HRS)

Data release for solar-sensor angle analysis subset associated with the...

Datasets for Sentiment Analysis

Dataset Direct Download Service (WFS): Hunting and Wildlife Area (Savoie)

R code for obtaining species occurrence data from open datasets

Dataset Direct Download Service (WFS): Easement plate A5 linked to public...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Simple download service (Atom) of the dataset: N BRUIT ZBR INFRA R A0006 A...

Cleaned NHANES 1988-2018

Augmentasi R Dataset

Augmentasi R

Personal Protective Equipment Dataset (PPED)

Dior R DatasetSee More Versions

dior-r-jbpln

dior-r-dataset

DIOR R

Dior R Dataset