100+ datasets found

w
Dataset of books series that contain Delete
workwithdata.com
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of books series that contain Delete [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Delete&j=1&j0=books
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book series. It has 1 row and is filtered where the books is Delete. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
R
Delete Dataset
universe.roboflow.com
zip
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TEC (2023). Delete Dataset [Dataset]. https://universe.roboflow.com/tec-h4jb3/delete-ny7cg
Explore at:
zipAvailable download formats
Dataset updated
Sep 16, 2023
Dataset authored and provided by
TEC
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Game Objects Bounding Boxes
Description
Delete

## Overview Delete is a dataset for object detection tasks - it contains Game Objects annotations for 936 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
D
Public Dataset Access and Usage
data.sfgov.org
s.cnmilf.com
+2more
application/rdfxml +5
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Public Dataset Access and Usage [Dataset]. https://data.sfgov.org/City-Infrastructure/Public-Dataset-Access-and-Usage/su99-qvi4
Explore at:
csv, application/rssxml, json, tsv, application/rdfxml, xmlAvailable download formats
Dataset updated
Jul 23, 2025
Description
A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).

B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.

C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.

D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.

Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.
d
Addresses (Open Data)
catalog.data.gov
data.tempe.gov
+10more
Updated Jul 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
Explore at:
Dataset updated
Jul 26, 2025
Dataset provided by
City of Tempe
Description
This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary
c
ckanext-cancel-dataset-creation
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-cancel-dataset-creation [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-cancel-dataset-creation
Explore at:
Dataset updated
Jun 4, 2025
Description
The cancel-dataset-creation extension for CKAN provides users with the ability to delete draft datasets and cancel dataset creation mid-process. This extension streamlines the dataset management experience by allowing for the removal of unwanted or incomplete datasets directly from the CKAN interface. It addresses the common scenario where datasets are started but not finalized, preventing clutter and improving overall data governance. Key Features: Draft Dataset Deletion: Enables users to delete datasets that are in a draft state, removing them entirely from the CKAN instance and preventing them from being published. In-Progress Dataset Cancellation: Allows users to cancel the dataset creation process while it is in progress and remove any partially created data associated with the dataset. Integration with CKAN Workflow: Seamlessly integrates into the existing CKAN dataset creation workflow, providing intuitive options to cancel or delete datasets at various stages. Technical Integration: The extension integrates with CKAN by adding new functionalities directly into the platform's dataset management interface. This offers accessible controls to delete datasets, ensuring compatibility with CKAN's native features. Simply add cancel_dataset_creation to the ckan.plugins setting in your CKAN config file. Benefits & Impact: By implementing the cancel-dataset-creation extension, CKAN users can effectively manage and remove unwanted datasets, resulting in a cleaner and more organized data catalog. This reduces the risk of publishing incomplete datasets and streamlines the overall data management process.
DELETE-bot-fight-data
huggingface.co
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huggingface Projects (2024). DELETE-bot-fight-data [Dataset]. https://huggingface.co/datasets/huggingface-projects/DELETE-bot-fight-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2024
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Huggingface Projects
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
huggingface-projects/DELETE-bot-fight-data dataset hosted on Hugging Face and contributed by the HF Datasets community
R
Delete Dataset
universe.roboflow.com
zip
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MSC onlab 1 (2024). Delete Dataset [Dataset]. https://universe.roboflow.com/msc-onlab-1/delete-k0ltl/dataset/7
Explore at:
zipAvailable download formats
Dataset updated
May 23, 2024
Dataset authored and provided by
MSC onlab 1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Asd Bounding Boxes
Description
Delete

## Overview Delete is a dataset for object detection tasks - it contains Asd annotations for 2,109 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Metadata for Datasets and Relationships
figshare.com
bin
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kate Lin; Tarfah Alrashed; Natasha Noy (2024). Metadata for Datasets and Relationships [Dataset]. http://doi.org/10.6084/m9.figshare.22790810.v6
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22790810.v6
Dataset updated
Aug 23, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kate Lin; Tarfah Alrashed; Natasha Noy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains two tables. One table contains metadata for "citable" datasets (datasets that have either a DOI or a compact identifier). The other table contains the relationships between each pair of datasets in the first table.We generated this corpus of dataset metadata by crawling the Web to find pages with schema.org or DCAT metadata indicating that the page contains a dataset. The metadata for datasets includes information such as the dataset’s name, description, provider, creation date, Digital Object Identifiers (DOI), and more. Out of the 46 million dataset pages that have schema.org, we publish this subset of 4.3 million dataset-metadata entries that are citable. We also include an additional table on relationships between these datasets.Please contact dataset-search@googlegroups.com if you have any questions or requests to remove a dataset that you own from this collection.
d
3.07 AZ Merit Data (summary)
catalog.data.gov
data-academy.tempe.gov
+13more
Updated Jan 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 3.07 AZ Merit Data (summary) [Dataset]. https://catalog.data.gov/dataset/3-07-az-merit-data-summary-55307
Explore at:
Dataset updated
Jan 17, 2025
Dataset provided by
City of Tempe
Description
This page provides data for the 3rd Grade Reading Level Proficiency performance measure.The dataset includes the student performance results on the English/Language Arts section of the AzMERIT from the Fall 2017 and Spring 2018. Data is representive of students in third grade in public elementary schools in Tempe. This includes schools from both Tempe Elementary and Kyrene districts. Results are by school and provide the total number of students tested, total percentage passing and percentage of students scoring at each of the four levels of proficiency. The performance measure dashboard is available at 3.07 3rd Grade Reading Level Proficiency.Additional InformationSource: Arizona Department of EducationContact: Ann Lynn DiDomenicoContact E-Mail: Ann_DiDomenico@tempe.govData Source Type: Excel/ CSVPreparation Method: Filters on original dataset: within "Schools" Tab School District [select Tempe School District and Kyrene School District]; School Name [deselect Kyrene SD not in Tempe city limits]; Content Area [select English Language Arts]; Test Level [select Grade 3]; Subgroup/Ethnicity [select All Students] Remove irrelevant fields; Add Fiscal YearPublish Frequency: Annually as data becomes availablePublish Method: ManualData Dictionary
h
pii-masking-300k
huggingface.co
Updated Apr 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai4Privacy (2024). pii-masking-300k [Dataset]. http://doi.org/10.57967/hf/1995
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/1995
Dataset updated
Apr 4, 2024
Dataset authored and provided by
Ai4Privacy
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Purpose and Features

🌍 World's largest open dataset for privacy masking 🌎 The dataset is useful to train and evaluate models to remove personally identifiable and sensitive information from text, especially in the context of AI assistants and LLMs. Key facts:

OpenPII-220k text entries have 27 PII classes (types of sensitive data), targeting 749 discussion subjects / use cases split across education, health, and psychology. FinPII contains an additional ~20 types tailored to… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-300k.
d
Training dataset for NABat Machine Learning V1.0
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Training dataset for NABat Machine Learning V1.0 [Dataset]. https://catalog.data.gov/dataset/training-dataset-for-nabat-machine-learning-v1-0
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.
h
RedPajama-Data-Instruct
huggingface.co
Updated Oct 15, 2004
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Together (2004). RedPajama-Data-Instruct [Dataset]. https://huggingface.co/datasets/togethercomputer/RedPajama-Data-Instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2004
Dataset authored and provided by
Together
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Summary

RedPajama-Instruct-Data is curated from a diverse collection of NLP tasks from both P3 (BigScience) and Natural Instruction (AI2), and conduct aggressive decontamination against HELM, in two steps: (1) We first conduct semantic search using each validation example in HELM as the query and get top-100 similar instances from the Instruct data set and check tasks that have any returned instances overlapping (using 10-Gram) with the validation example. We remove the… See the full description on the dataset page: https://huggingface.co/datasets/togethercomputer/RedPajama-Data-Instruct.
Secom Dataset
kaggle.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
G Creatives (2023). Secom Dataset [Dataset]. https://www.kaggle.com/datasets/gcreatives/secom-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
G Creatives
Description
Title: SECOM Data Set

Abstract: Data from a semi-conductor manufacturing process

Data Set Characteristics: Multivariate Number of Instances: 1567 Area: Computer Attribute Characteristics: Real Number of Attributes: 591 Date Donated: 2008-11-19 Associated Tasks: Classification, Causal-Discovery Missing Values? Yes

Source:

Authors: Michael McCann, Adrian Johnston

Data Set Information:

A complex modern semi-conductor manufacturing process is normally under consistent surveillance via the monitoring of signals/variables collected from sensors and or process measurement points. However, not all of these signals are equally valuable in a specific monitoring system. The measured signals contain a combination of useful information, irrelevant information as well as noise. It is often the case that useful information is buried in the latter two. Engineers typically have a much larger number of signals than are actually required. If we consider each type of signal as a feature, then feature selection may be applied to identify the most relevant signals. The Process Engineers may then use these signals to determine key factors contributing to yield excursions downstream in the process. This will enable an increase in process throughput, decreased time to learning and reduce the per unit production costs.

To enhance current business improvement techniques the application of feature selection as an intelligent systems technique is being investigated.

The dataset presented in this case represents a selection of such features where each example represents a single production entity with associated measured features and the labels represent a simple pass/fail yield for in house line testing, figure 2, and associated date time stamp. Where .1 corresponds to a pass and 1 corresponds to a fail and the data time stamp is for that specific test point.

Using feature selection techniques it is desired to rank features according to their impact on the overall yield for the product, causal relationships may also be considered with a view to identifying the key features.

Results may be submitted in terms of feature relevance for predictability using error rates as our evaluation metrics. It is suggested that cross validation be applied to generate these results. Some baseline results are shown below for basic feature selection techniques using a simple kernel ridge classifier and 10 fold cross validation.

Baseline Results: Pre-processing objects were applied to the dataset simply to standardize the data and remove the constant features and then a number of different feature selection objects selecting 40 highest ranked features were applied with a simple classifier to achieve some initial results. 10 fold cross validation was used and the balanced error rate (*BER) generated as our initial performance metric to help investigate this dataset.

SECOM Dataset: 1567 examples 591 features, 104 fails

FSmethod (40 features) BER % True + % True - % S2N (signal to noise) 34.5 +-2.6 57.8 +-5.3 73.1 +2.1 Ttest 33.7 +-2.1 59.6 +-4.7 73.0 +-1.8 Relief 40.1 +-2.8 48.3 +-5.9 71.6 +-3.2 Pearson 34.1 +-2.0 57.4 +-4.3 74.4 +-4.9 Ftest 33.5 +-2.2 59.1 +-4.8 73.8 +-1.8 Gram Schmidt 35.6 +-2.4 51.2 +-11.8 77.5 +-2.3

Attribute Information:

Key facts: Data Structure: The data consists of 2 files the dataset file SECOM consisting of 1567 examples each with 591 features a 1567 x 591 matrix and a labels file containing the classifications and date time stamp for each example.

As with any real life data situations this data contains null values varying in intensity depending on the individuals features. This needs to be taken into consideration when investigating the data either through pre-processing or within the technique applied.

The data is represented in a raw text file each line representing an individual example and the features seperated by spaces. The null values are represented by the 'NaN' value as per MatLab.
f
Data from: Methodology to filter out outliers in high spatial density data...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14305658.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
f
Data from: PCP-SAFT Parameters of Pure Substances Using Large Experimental...
acs.figshare.com
zip
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timm Esper; Gernot Bauer; Philipp Rehner; Joachim Gross (2023). PCP-SAFT Parameters of Pure Substances Using Large Experimental Databases [Dataset]. http://doi.org/10.1021/acs.iecr.3c02255.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.iecr.3c02255.s001
Dataset updated
Sep 6, 2023
Dataset provided by
ACS Publications
Authors
Timm Esper; Gernot Bauer; Philipp Rehner; Joachim Gross
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This work reports pure component parameters for the PCP-SAFT equation of state for 1842 substances using a total of approximately 551 172 experimental data points for vapor pressure and liquid density. We utilize data from commercial and public databases in combination with an automated workflow to assign chemical identifiers to all substances, remove duplicate data sets, and filter unsuited data. The use of raw experimental data, as opposed to pseudoexperimental data from empirical correlations, requires means to identify and remove outliers, especially for vapor pressure data. We apply robust regression using a Huber loss function. For identifying and removing outliers, the empirical Wagner equation for vapor pressure is adjusted to experimental data, because the Wagner equation is mathematically rather flexible and is thus not subject to a systematic model bias. For adjusting model parameters of the PCP-SAFT model, nonpolar, dipolar and associating substances are distinguished. The resulting substance-specific parameters of the PCP-SAFT equation of state yield in a mean absolute relative deviation of the of 2.73% for vapor pressure and 0.52% for liquid densities (2.56% and 0.47% for nonpolar substances, 2.67% and 0.61% for dipolar substances, and 3.24% and 0.54% for associating substances) when evaluated against outlier-removed data. All parameters are provided as JSON and CSV files.
w
Dataset of books called The feud in the fifth remove
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The feud in the fifth remove [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+feud+in+the+fifth+remove
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is The feud in the fifth remove. It features 7 columns including author, publication date, language, and book publisher.
T
wiki40b
tensorflow.org
opendatalab.com
+1more
Updated Aug 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). wiki40b [Dataset]. https://www.tensorflow.org/datasets/catalog/wiki40b
Explore at:
Dataset updated
Aug 30, 2023
Description
Clean-up text for 40+ Wikipedia languages editions of pages correspond to entities. The datasets have train/dev/test splits per language. The dataset is cleaned up by page filtering to remove disambiguation pages, redirect pages, deleted pages, and non-entity pages. Each example contains the wikidata id of the entity, and the full Wikipedia article after page processing that removes non-content sections and structured objects. The language models trained on this corpus - including 41 monolingual models, and 2 multilingual models - can be found at https://tfhub.dev/google/collections/wiki40b-lm/1.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('wiki40b', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Data set for co-contaminant paper
catalog.data.gov
Updated Oct 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Data set for co-contaminant paper [Dataset]. https://catalog.data.gov/dataset/data-set-for-co-contaminant-paper
Explore at:
Dataset updated
Oct 3, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The objective of this research was to evaluate the performance of five arsenic adsorptive media technology of full-scale drinking water treatment systems in the USEPA Arsenic Demonstration Program (ADP) that were found to simultaneously remove other co-occurring inorganic contaminants. The datasets presented here are the figures included in the referenced journal article. This dataset is associated with the following publication: Sorg, T., A. Chen, L. Wang, and D. Lytle. Removing co-occurring contaminants of arsenic and vanadium with full-scale arsenic adsorptive media systems. JOURNAL OF WATER SUPPLY: RESEARCH AND TECHNOLOGY - AQUA. IWA Publishing, London, UK, ( ): 2021148, (2021).
m
KU-HAR: An Open Dataset for Human Activity Recognition
data.mendeley.com
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah-Al Nahid (2021). KU-HAR: An Open Dataset for Human Activity Recognition [Dataset]. http://doi.org/10.17632/45f952y38r.5
Explore at:
Unique identifier
https://doi.org/10.17632/45f952y38r.5
Dataset updated
Feb 16, 2021
Authors
Abdullah-Al Nahid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
(Always use the latest version of the dataset. )

Human Activity Recognition (HAR) refers to the capacity of machines to perceive human actions. This dataset contains information on 18 different activities collected from 90 participants (75 male and 15 female) using smartphone sensors (Accelerometer and Gyroscope). It has 1945 raw activity samples collected directly from the participants, and 20750 subsamples extracted from them. The activities are:

Stand➞ Standing still (1 min) Sit➞ Sitting still (1 min) Talk-sit➞ Talking with hand movements while sitting (1 min) Talk-stand➞ Talking with hand movements while standing or walking(1 min) Stand-sit➞ Repeatedly standing up and sitting down (5 times) Lay➞ Laying still (1 min) Lay-stand➞ Repeatedly standing up and laying down (5 times) Pick➞ Picking up an object from the floor (10 times) Jump➞ Jumping repeatedly (10 times) Push-up➞ Performing full push-ups (5 times) Sit-up➞ Performing sit-ups (5 times) Walk➞ Walking 20 meters (≈12 s) Walk-backward➞ Walking backward for 20 meters (≈20 s) Walk-circle➞ Walking along a circular path (≈ 20 s) Run➞ Running 20 meters (≈7 s) Stair-up➞ Ascending on a set of stairs (≈1 min) Stair-down➞ Descending from a set of stairs (≈50 s) Table-tennis➞ Playing table tennis (1 min)

Contents of the attached .zip files are: 1.Raw_time_domian_data.zip➞ Originally collected 1945 time-domain samples in separate .csv files. The arrangement of information in each .csv file is: Column 1, 5➞ exact time (elapsed since the start) when the Accelerometer & Gyro output was recorded (in ms) Col. 2, 3, 4➞ Acceleration along X,Y,Z axes (in m/s^2) Col. 6, 7, 8➞ Rate of rotation around X,Y,Z axes (in rad/s)

2.Trimmed_interpolated_raw_data.zip➞ Unnecessary parts of the samples were trimmed (only from the beginning and the end). The samples were interpolated to keep a constant sampling rate of 100 Hz. The arrangement of information is the same as above.

3.Time_domain_subsamples.zip➞ 20750 subsamples extracted from the 1945 collected samples provided in a single .csv file. Each of them contains 3 seconds of non-overlapping data of the corresponding activity. Arrangement of information: Col. 1–300, 301–600, 601–900➞ Acc.meter X, Y, Z axes readings Col. 901–1200, 1201–1500, 1501–1800➞ Gyro X, Y, Z axes readings Col. 1801➞ Class ID (0 to 17, in the order mentioned above) Col. 1802➞ length of the each channel data in the subsample Col. 1803➞ serial no. of the subsample

Gravity acceleration was omitted from the Acc.meter data, and no filter was applied to remove noise. The dataset is free to download, modify, and use.

More information is provided in the data paper which is currently under review: N. Sikder, A.-A. Nahid, KU-HAR: An open dataset for heterogeneous human activity recognition, Pattern Recognit. Lett. (submitted).

A preprint will be available soon.

Backup: drive.google.com/drive/folders/1yrG8pwq3XMlyEGYMnM-8xnrd6js0oXA7
d
City of Tempe 2023 Business Survey Data
catalog.data.gov
s.cnmilf.com
+9more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2023 Business Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2023-business-survey-data
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
City of Tempe
Area covered
Tempe
Description
These data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2024). Dataset of books series that contain Delete [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Delete&j=1&j0=books

Dataset of books series that contain Delete

Explore at:

Dataset updated

Nov 25, 2024

Dataset authored and provided by

Work With Data

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is about book series. It has 1 row and is filtered where the books is Delete. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

Clear search

Close search

Google apps

Main menu

Dataset of books series that contain Delete

Delete Dataset

Delete

Public Dataset Access and Usage

Addresses (Open Data)

ckanext-cancel-dataset-creation

DELETE-bot-fight-data

Delete Dataset

Delete

Metadata for Datasets and Relationships

3.07 AZ Merit Data (summary)

pii-masking-300k

Training dataset for NABat Machine Learning V1.0

RedPajama-Data-Instruct

Secom Dataset

Data from: Methodology to filter out outliers in high spatial density data...

Data from: PCP-SAFT Parameters of Pure Substances Using Large Experimental...

Dataset of books called The feud in the fifth remove

wiki40b

Data set for co-contaminant paper

KU-HAR: An Open Dataset for Human Activity Recognition

City of Tempe 2023 Business Survey Data

Dataset of books series that contain Delete