49 datasets found

T
Asset Row Count Checker
internal.chattadata.org
chattadata.org
application/rdfxml +5
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Asset Row Count Checker [Dataset]. https://internal.chattadata.org/w/4qjh-fdm6/default?cur=s-SPMLP_e2T
Explore at:
application/rdfxml, csv, tsv, json, application/rssxml, xmlAvailable download formats
Dataset updated
Jul 6, 2025
Description
This dataset tracks the number of days since the row count on a dataset asset has changed. It's purpose is to ensure datasets are updating as expected. This dataset is identical to the Socrata Asset Inventory with added Checkpoint Date and Days Since Row Count Change attributes.
Hottest Kaggle Datasets
kaggle.com
Updated Jan 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abeer Alzuhair (2021). Hottest Kaggle Datasets [Dataset]. https://www.kaggle.com/abeeralzuhair2020/hottest-kaggle-datasets/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abeer Alzuhair
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This data was collected as a course project for the immersive data science course (by General Assembly and Misk Academy).

Content

This dataset is in a CSV format, it consists of 5717 rows and 15 columns, where each row is a dataset on Kaggle and each column represents a feature of that dataset. |Feature|Description| |-------|-----------| |title| dataset name | |usability| dataset usability rating by Kaggle | |num_of_files| number of files associated with the dataset | |types_of_files| types of files associated with the dataset | |files_size| size of the dataset files | |vote_counts| total votes count by the dataset viewer | |medal| reward to popular datasets measured by the number of upvotes (votes by novices are excluded from medal calculation), [Bronze = 5 Votes, Silver = 20 Votes, Gold = 50 Votes] | |url_reference| reference to the dataset page on Kaggle in the format: www.kaggle.com/url_reference | |keywords| Topics tagged with the dataset | |num_of_columns| number of features in the dataset | |views| number of views | |downloads| number of downloads | |download_per_view| download per view ratio | |date_created| dataset creation date | |last_updated| date of the last update |

Acknowledgements

I would like to thank all my GA instructors for their continuous help and support

All data were taken from https://www.kaggle.com , collected on 30 Jan 2021

Inspiration

Using this dataset, we could try to predict the upcoming datasets uploaded, number of votes, number of downloads, medal type, etc.
w
Dataset of book subjects that contain Count Draco down under
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Count Draco down under [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Count+Draco+down+under&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 1 row and is filtered where the books is Count Draco down under. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
I
Conceptual novelty scores for PubMed articles
databank.illinois.edu
Updated Apr 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubhanshu Mishra; Vetle I. Torvik (2018). Conceptual novelty scores for PubMed articles [Dataset]. http://doi.org/10.13012/B2IDB-5060298_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-5060298_V1
Dataset updated
Apr 27, 2018
Authors
Shubhanshu Mishra; Vetle I. Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
National Science Foundationhttp://www.nsf.gov/
U.S. National Institutes of Health (NIH)
Description
Conceptual novelty analysis data based on PubMed Medical Subject Headings ---------------------------------------------------------------------- Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra. It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015. The dataset is distributed in the form of the following tab separated text files: * PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow: - PMID: PubMed ID - Year: year of publication - TimeNovelty: time novelty score of the paper based on individual concepts (see paper) - VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper) - PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper) - PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper) * mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow: - MeshTerm: Name of the MeSH term - Year: year - AbsVal: Total publications with that MeSH term in the given year - TimeNovelty: age (in years since first publication) of MeSH term in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year * meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years - Mesh1: Name of the first MeSH term (alphabetically sorted) - Mesh2: Name of the second MeSH term (alphabetically sorted) - Year: year - AbsVal: Total publications with that MeSH pair in the given year - TimeNovelty: age (in years since first publication) of MeSH pair in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year * README.txt file ## Dataset creation This dataset was constructed using multiple datasets described in the following locations: * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * MeSH tree 2015: ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/ * Source code provided at: https://github.com/napsternxg/Novelty Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions: Additional data related updates can be found at: Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742 . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/Novelty
Z
PSYCHE-D: predicting change in depression severity using person-generated...
data.niaid.nih.gov
zenodo.org
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Jaggi (2024). PSYCHE-D: predicting change in depression severity using person-generated health data (DATASET) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5085145
Explore at:
Dataset updated
Jul 18, 2024
Dataset provided by
Martin Jaggi
Jae Min
Marta Ferreira
Raghu Kainkaryam
Ieuan Clay
Mariko Makhmutova
Description
This dataset is made available under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). See LICENSE.pdf for details.

Dataset description

Parquet file, with:

35694 rows

154 columns

The file is indexed on [participant]_[month], such that 34_12 means month 12 from participant 34. All participant IDs have been replaced with randomly generated integers and the conversion table deleted.

Column names and explanations are included as a separate tab-delimited file. Detailed descriptions of feature engineering are available from the linked publications.

File contains aggregated, derived feature matrix describing person-generated health data (PGHD) captured as part of the DiSCover Project (https://clinicaltrials.gov/ct2/show/NCT03421223). This matrix focuses on individual changes in depression status over time, as measured by PHQ-9.

The DiSCover Project is a 1-year long longitudinal study consisting of 10,036 individuals in the United States, who wore consumer-grade wearable devices throughout the study and completed monthly surveys about their mental health and/or lifestyle changes, between January 2018 and January 2020.

The data subset used in this work comprises the following:

Wearable PGHD: step and sleep data from the participants’ consumer-grade wearable devices (Fitbit) worn throughout the study

Screener survey: prior to the study, participants self-reported socio-demographic information, as well as comorbidities

Lifestyle and medication changes (LMC) survey: every month, participants were requested to complete a brief survey reporting changes in their lifestyle and medication over the past month

Patient Health Questionnaire (PHQ-9) score: every 3 months, participants were requested to complete the PHQ-9, a 9-item questionnaire that has proven to be reliable and valid to measure depression severity

From these input sources we define a range of input features, both static (defined once, remain constant for all samples from a given participant throughout the study, e.g. demographic features) and dynamic (varying with time for a given participant, e.g. behavioral features derived from consumer-grade wearables).

The dataset contains a total of 35,694 rows for each month of data collection from the participants. We can generate 3-month long, non-overlapping, independent samples to capture changes in depression status over time with PGHD. We use the notation ‘SM0’ (sample month 0), ‘SM1’, ‘SM2’ and ‘SM3’ to refer to relative time points within each sample. Each 3-month sample consists of: PHQ-9 survey responses at SM0 and SM3, one set of screener survey responses, LMC survey responses at SM3 (as well as SM1, SM2, if available), and wearable PGHD for SM3 (and SM1, SM2, if available). The wearable PGHD includes data collected from 8 to 14 days prior to the PHQ-9 label generation date at SM3. Doing this generates a total of 10,866 samples from 4,036 unique participants.
Synthetic EHR Dataset (10 % demo, 7 languages)
kaggle.com
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enrique Mas (2025). Synthetic EHR Dataset (10 % demo, 7 languages) [Dataset]. http://doi.org/10.34740/kaggle/dsv/12247534
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/12247534
Dataset updated
Jun 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Enrique Mas
Description
Context & Motivation Electronic Health Records (EHR) are a cornerstone for modern healthcare analytics and machine-learning research—but real clinical data is sensitive, tightly regulated, and hard to share. To enable rapid prototyping, teaching, and multi-language experimentation without privacy concerns, we generated a synthetic, longitudinal EHR dataset in seven languages. Contents & Structure 100 K total records (10 K demo per language) Simulates multi-visit patients over a 10-year span Includes 16 core clinical variables: demographics (ID, sex, age), vitals, diagnosis (ICD-10), treatments, comorbidities, outcomes, and relapse risk All values are entirely artificial, statistically coherent but containing no real patient information Languages English, Spanish, French, Portuguese, Arabic, Hindi, Russian—ready for international ML, NLP, or data-science pipelines. Use Cases Quickly benchmark classification/regression models (risk prediction, outcome forecasting) Prototype dashboards or visualizations in any language Build multi-lingual NLP tools on synthetic clinical notes Educational labs, hackathons, or demos without GDPR/PHI hurdles Generation & Quality Data was simulated using Python (pandas, Faker) with realistic distributions for vital signs, diagnoses, and comorbidities. We ensured each language version uses its local terminology and character set, so you can test encoding, tokenization, and locale-sensitive pipelines. License & Access This demo is released under a custom MIT-style license (see LICENSE.txt). For the full 100 K-row dataset with extended documentation and variable dictionaries, visit our Gumroad page or contact em@sianabox.com.
D
Dwelling Unit Completion Counts by Building Permit
data.sfgov.org
catalog.data.gov
application/rdfxml +5
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Dwelling Unit Completion Counts by Building Permit [Dataset]. https://data.sfgov.org/widgets/j67f-aayr?mobile_redirect=true
Explore at:
application/rssxml, xml, csv, application/rdfxml, json, tsvAvailable download formats
Dataset updated
Jun 4, 2025
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
A. SUMMARY

This dataset reports the number of new residential units made available for occupancy in San Francisco since January 2018. Each row in this dataset shows the change in the number of new units associated with a building permit application. Each row also includes the date those units were approved for occupancy, the type of document approving them, and their address.

Values in the column [Number of Units Certified] can be added together to produce a count of new units approved for occupancy since January 2018.

These records provide a preliminary count of new residential units. The San Francisco Planning Department issues a Housing Inventory Report each year that provides a more complete account of new residential units, and those results may vary slightly from records in this dataset. The Housing Inventory Report is an in-depth annual research project requiring extensive work to validate information about projects. By comparison, this dataset is meant to provide more timely updates about housing production based on available administrative data. The Department of Building Inspection and Planning Department will reconcile these records with future Housing Inventory Reports.

B. METHODOLOGY

At the end of each month, DBI staff manually calculate how many new units are available for occupancy for each building permit application and enters that information into this dataset. These records reflect counts for all types of residential units, including authorized accessory dwelling units. These records do not reflect units demolished or removed from the city’s available housing stock.

Multiple records may be associated with the same building permit application number, which means that new certifications or amendments were issued. Only changes to the net number of units associated with that permit application are recorded in subsequent records.

For example, Building Permit Application Number [201601010001] located at [123 1st Avenue] was issued an [Initial TCO] Temporary Certificate of Occupancy on [January 1, 2018] approving 10 units for occupancy. Then, an [Amended TCO] was issued on [June 1, 2018] approving [5] additional units for occupancy, for a total of 15 new units associated with that Building Permit Application Number. The building will appear as twice in the dataset, each row representing when new units were approved.

If additional or amended certifications are issued for a building permit application, but they do not change the number of units associated with that building permit application, those certifications are not recorded in this dataset. For example, if all new units associated with a project are certified for occupancy under an Initial TCO, then the Certificate of Final Completion (CFC) would not appear in the dataset because the CFC would not add new units to the housing stock. See data definitions for more details.

C. UPDATE FREQUENCY

This dataset is updated monthly.

D. DOCUMENT TYPES

Several documents issued near or at project completion can certify units for occupation. They are: Initial Temporary Certificate of Occupancy (TCO), Amended TCO, and Certificate of Final Completion (CFC).

• Initial TCO is a document that allows for occupancy of a unit before final project completion is certified, conditional on if the unit can be occupied safely. The TCO is meant to be temporary and has an expiration date. This field represents the number of units certified for occupancy when the TCO is issued. • Amended TCO is a document that is issued when the conditions of the project are changed before final project completion is certified. These records show additional new units that have become habitable since the issuance of the Initial TCO. • Certificate of Final Completion (CFC) is a document that is issued when all work is completed according to approved plans, and the building is ready for complete occupancy. These records show additional new units that were not accounted for in the Initial or Amended TCOs.
h
tissue_cell_raw_p_dataset_new
huggingface.co
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
2025 Longevity x AI Hackathon (2024). tissue_cell_raw_p_dataset_new [Dataset]. https://huggingface.co/datasets/longevity-db/tissue_cell_raw_p_dataset_new
Explore at:
Dataset updated
Jul 6, 2024
Dataset authored and provided by
2025 Longevity x AI Hackathon
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Tissue Cell Raw P Dataset

This dataset was processed using the parq2hug tool on 2025-06-15.

Dataset Information

Rows: 20,116 Columns: 195 File Size: 25.91 MB

File Structure expression.parquet

expression.parquet contains the main dataset with 20,116 rows and 195 columns.

feature_metadata.parquet

feature_metadata.parquet contains metadata for each feature (column) in the dataset, including:

Feature name Data type Statistics (count, mean… See the full description on the dataset page: https://huggingface.co/datasets/longevity-db/tissue_cell_raw_p_dataset_new.
O
Cambridge Homeless Point-in-Time Count data: 2012-2024
data.cambridgema.gov
application/rdfxml +5
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cambridge Department of Human Service Programs (2024). Cambridge Homeless Point-in-Time Count data: 2012-2024 [Dataset]. https://data.cambridgema.gov/Department-of-Human-Service-Programs-DHSP-/Cambridge-Homeless-Point-in-Time-Count-data-2012-2/ify2-i22z
Explore at:
tsv, json, xml, application/rdfxml, csv, application/rssxmlAvailable download formats
Dataset updated
Sep 11, 2024
Dataset authored and provided by
Cambridge Department of Human Service Programs
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
This dataset includes Point-in-Time (PIT) data collected in Cambridge between 2012 and 2024. The PIT count is a count of sheltered and unsheltered homeless persons on a single night in January. The U.S. Department of Housing and Urban Development (HUD) requires that communities receiving funding through the Continuum of Care (CoC) Program conduct an annual count of homeless persons on a single night in the last 10 days of January, and these data contribute to national estimates of homelessness reported in the Annual Homeless Assessment Report to the U.S. Congress. This dataset is comprised of data submitted to, and stored in, HUD’s Homelessness Data Exchange (HDX).

This dataset includes basic counts and demographic information of persons experiencing homelessness on each PIT date from 2012-2024. The dataset contains four rows for each year, including one row for each housing type: Emergency Shelter, Transitional Housing, or Unsheltered. The dataset also includes housing inventory counts of the number of shelter and transitional housing units available on each of the PIT count dates.

Information about persons staying in emergency shelters and transitional housing units is exported from the Homeless Management Information System (HMIS), which is the primary database for recording client-level service records. Information about persons in unsheltered situations is compiled by first conducting an overnight street count of persons observed sleeping outdoors on the PIT night to establish the total number of unsheltered persons. Demographic information for unsheltered persons is then extrapolated by utilizing assessment data collected by street outreach workers during the 7 days following the PIT count.
The Dynamics of Collective Action Corpus
zenodo.org
data.niaid.nih.gov
bin
Updated Oct 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dustin S. Stoltz; Dustin S. Stoltz; Marshall A. Taylor; Marshall A. Taylor; Jennifer S.K. Dudley; Jennifer S.K. Dudley (2023). The Dynamics of Collective Action Corpus [Dataset]. http://doi.org/10.5281/zenodo.8414335
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8414335
Dataset updated
Oct 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dustin S. Stoltz; Dustin S. Stoltz; Marshall A. Taylor; Marshall A. Taylor; Jennifer S.K. Dudley; Jennifer S.K. Dudley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This respository includes two datasets, a Document-Term Matrix and associated metadata, for 17,493 New York Times articles covering protest events, both saved as single R objects.

These datasets are based on the original Dynamics of Collective Action (DoCA) dataset (Wang and Soule 2012; Earl, Soule, and McCarthy). The original DoCA datset contains variables for protest events referenced in roughly 19,676 New York Times articles reporting on collective action events occurring in the US between 1960 and 1995. Data were collected as part of the Dynamics of Collective Action Project at Stanford University. Research assistants read every page of all daily issues of the New York Times to find descriptions of 23,624 distinct protest events. The text for the news articles were not included in the original DoCA data.

We attempted to recollect the raw text in a semi-supervised fashion by matching article titles to create the Dynamics of Collective Action Corpus. In addition to hand-checking random samples and hand-collecting some articles (specifically, in the case of false positives), we also used some automated matching processes to ensure the recollected article titles matched their respective titles in the DoCA dataset. The final number of recollected and matched articles is 17,493.

We then subset the original DoCA dataset to include only rows that match a recollected article. The "20231006_dca_metadata_subset.Rds" contains all of the metadata variables from the original DoCA dataset (see Codebook), with the addition of "pdf_file" and "pub_title" which is the title of the recollected article (and may differ from the "title" variable in the original dataset), for a total of 106 variables and 21,126 rows (noting that a row is a distinct protest events and one article may cover more than one protest event).

Once collected, we prepared these texts using typical preprocessing procedures (and some less typical procedures, which were necessary given that these were OCRed texts). We followed these steps in this order: We removed headers and footers that were consistent across all digitized stories and any web links or HTML; added a single space before an uppercase letter when it was flush against a lowercase letter to its right (e.g., turning "JohnKennedy'' into "John Kennedy''); removed excess whitespace; converted all characters to the broadest range of Latin characters and then transliterated to ``Basic Latin'' ASCII characters; replaced curly quotes with their ASCII counterparts; replaced contractions (e.g., turned "it's'' into "it is''); removed punctuation; removed capitalization; removed numbers; fixed word kerning; applied a final extra round of whitespace removal.

We then tokenized them by following the rule that each word is a character string surrounded by a single space. At this step, each document is then a list of tokens. We count each unique token to create a document-term matrix (DTM), where each row is an article, each column is a unique token (occurring at least once in the corpus as a whole), and each cell is the number of times each token occurred in each article. Finally, we removed words (i.e., columns in the DTM) that occurred less than four times in the corpus as a whole or were only a single character in length (likely orphaned characters from the OCRing process). The final DTM has 66,552 unique words, 10,134,304 total tokens and 17,493. The "20231006_dca_dtm.Rds" is a sparse matrix class object from the Matrix R package.

In R, use the load() function to load the objects `dca_dtm` and `dca_meta`. To associate the `dca_meta` to the `dca_dtm` , match the "pdf_file" variable in`dca_meta` to the rownames of `dca_dtm`.
OMI/Aura NO2 Tropospheric, Stratospheric & Total Columns MINDS Daily L3...
s.cnmilf.com
gimi9.com
+4more
Updated Jun 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA/GSFC/SED/ESD/TISL/GESDISC (2025). OMI/Aura NO2 Tropospheric, Stratospheric & Total Columns MINDS Daily L3 Global Gridded 0.25 degree x 0.25 degree V1.1 (OMI_MINDS_NO2d) at GES DISC [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/omi-aura-no2-tropospheric-stratospheric-total-columns-minds-daily-l3-global-gridded-0-25-d-cca79
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
As part of the NASA's Making Earth System Data Records for Use in Research Environments (MEaSUREs) program, this project entitled “Multi-Decadal Nitrogen Dioxide and Derived Products from Satellites (MINDS)” will develop consistent long-term global trend-quality data records spanning the last two decades, over which remarkable changes in nitrogen oxides (NOx) emissions have occurred. The objective of the project Is to adapt Ozone Monitoring Instrument (OMI) operational algorithms to other satellite instruments and create consistent multi-satellite L2 and L3 nitrogen dioxide (NO2) columns and value-added L4 surface NO2 concentrations and NOx emissions data products, systematically accounting for instrumental differences. The instruments include Global Ozone Monitoring Experiment (GOME, 1996-2011), SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY, 2002-2012), OMI (2004-present), GOME-2 (2007-present), and TROPOspheric Monitoring Instrument (TROPOMI, 2018-present). The quality assured L2-L4 products will be made available to the scientific community via the NASA GES DISC website in Climate and Forecast (CF)-compliant Hierarchical Data Format (HDF5) and netCDF formats.
w
Dataset of total students of universities in Chile
workwithdata.com
Updated Feb 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of total students of universities in Chile [Dataset]. https://www.workwithdata.com/datasets/universities?col=total_students%2Cuniversity&f=1&fcol0=country&fop0=%3D&fval0=Chile
Explore at:
Dataset updated
Feb 7, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Chile
Description
This dataset is about universities in Chile. It has 16 rows. It features 2 columns including total students.
o
Ulta Skincare Product Review Data
opendatabay.com
.undefined
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Ulta Skincare Product Review Data [Dataset]. https://www.opendatabay.com/data/ai-ml/085313b2-2c4e-42ee-aeeb-a5b03d615f9c
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
Area covered
Reviews & Ratings
Description
This dataset features over 4,000 customer reviews of Dermalogica cleansing exfoliators, all sourced from Ulta.com. It was compiled on 27 March 2023 using Python libraries, specifically designed for Natural Language Processing (NLP) tasks. The dataset provides valuable insights into customer opinions and product performance, making it ideal for various analytical applications.

Columns

Review_Title: The main heading or title given to the review by the customer.

Review_Text: The complete content of the customer's review.

Verified_Buyer: Indicates whether the reviewer has been confirmed as a buyer of the product (True or False). Approximately 30% of reviews are from verified buyers.

Review_Date: The publication date of the review, relative to the data scraping date. Reviews span a period up to two years prior to the scrape date.

Review_Location: The geographic location provided by the reviewer. Many locations are undisclosed, with a small percentage from Los Angeles.

Review_Upvotes: The total number of times the review was positively rated by other users.

Review_Downvotes: The total number of times the review received negative ratings from other users.

Product: The specific Dermalogica cleansing exfoliator product that the review pertains to. Key products include "Daily Superfoliant" and "Daily Microfoliant", each accounting for approximately 36% of reviews.

Brand: The brand of the product being reviewed, which is consistently Dermalogica.

Scrape_Date: The exact date when the data was extracted from Ulta.com, which was 2023-03-27.

Distribution

The dataset contains over 4,000 individual reviews. Data was scraped on 27 March 2023. For the 'Verified_Buyer' column, there are 1,249 (30%) 'true' entries and 2,901 (70%) 'false' entries. Key product mentions are evenly split between 'Daily Superfoliant' and 'Daily Microfoliant', each at 36%. Review dates are categorised into "2 years ago" (22%), "1 year ago" (20%), and other periods (58%). Reviewer locations are largely "Undisclosed" (22%) or other (75%), with a small percentage (3%) from "Los Angeles". Specific numbers for rows or records beyond the total review count are not explicitly detailed, but metrics for upvotes and downvotes are available.

Usage

This dataset is particularly useful for: * Sentiment Analysis: Determining the overall positive or negative sentiments associated with each Dermalogica product. * Text Analysis: Extracting insights from review texts, such as common skincare concerns addressed by the products or issues they helped resolve or worsen. * Inferential Statistics: Analysing statistically significant differences in average sentiment scores across different product reviews. * Data Visualisation: Creating visual representations like bar plots or word clouds to highlight frequently used words or phrases in relation to specific products.

Coverage

The data encompasses customer reviews of Dermalogica cleansing exfoliators published on Ulta.com. Geographically, while many reviewer locations are undisclosed, some specific cities like Los Angeles are noted, and the dataset is broadly considered to have a global reach. The time range of the reviews extends back from the data scrape date of 27 March 2023, with reviews published up to two years prior. No specific demographic breakdown is provided, though the 'Verified_Buyer' flag offers a binary indication of purchase confirmation.

License

CC-BY

Who Can Use It

This dataset is beneficial for a range of professionals and organisations, including: * Data Scientists and NLP Engineers: For developing and testing natural language processing models. * Market Researchers: To understand customer feedback, identify market trends, and assess product performance within the skincare industry. * Skincare Brands: For gaining insights into customer satisfaction, identifying product strengths and weaknesses, and informing product development strategies. * Academics and Students: For research projects focused on consumer behaviour, text analytics, or machine learning applications in e-commerce.

Dataset Name Suggestions

Dermalogica Ulta Cleanser Reviews

Ulta Skincare Product Review Data

NLP Dermalogica Exfoliator Reviews

Dermalogica Reviews from Ulta.com

Attributes

Original Data Source: NLP: NLP: Ulta Skincare Reviews
4
Dataset of User Study "Artificial Trust Communication in a 2D grid-world...
data.4tu.nl
zip
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carolina Centeio Jorge; Elena Dumitrescu; C.M. (Catholijn) Jonker; Razvan Loghin; S Marossi; Tamer Şahin; Elena Uleia; Myrthe Tielman (2025). Dataset of User Study "Artificial Trust Communication in a 2D grid-world Collaborative Search and Rescue Scenario" [Dataset]. http://doi.org/10.4121/ace287c9-7a02-4d1f-aef7-8b306448edd5.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/ace287c9-7a02-4d1f-aef7-8b306448edd5.v2
Dataset updated
Jun 6, 2025
Dataset provided by
4TU.ResearchData
Authors
Carolina Centeio Jorge; Elena Dumitrescu; C.M. (Catholijn) Jonker; Razvan Loghin; S Marossi; Tamer Şahin; Elena Uleia; Myrthe Tielman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
Dutch Research Council
Horizon 2020
Description
Context: Communication and mutual trust are keys driver for effective teamwork in human teams. In human-AI teams, teams composed of both humans and artificial agents, communication and trust are also important. In this research project, we investigated how different artificial agent’s communication affect human’s trust and satisfaction, in such teams. Participants teamed up with artificial agents in an online setting (using 2D grid world) and their decisions were be logged. This dataset includes different metrics calculated based on the logs, self-reported questionnaire answers on trust and satisfaction, and free answers to open questions.

This dataset was created during the Research Project course of the Computer Science Bachelor's in Delft University of Technology supervised by Carolina Jorge and Dr. Myrthe Tielman. Five students ran a user study with six different conditions (the baseline, and five new developed by each of them). The full description of the user study and their individual results (i.e., pairwise comparison between their own condition and baseline) can be found in each of their thesis, linked in this page below.

Then, a full joint dataset was created and it can be found in "Full dataset.csv" (total 140 rows). To balance the number of participants per condition, we generated a "capped_dataset.csv" with 20 rows per condition (total N=120). We analysed differences among conditions, and rerun the pairwise comparisons, of "capped_dataset.csv". The code can be found in "Quantitative Analysis.ipynb". These results are to be published in a paper - the author contributions can be found in "author_contribution.txt".

The full code used for the generation of this dataset can be found in this Github repository: https://github.com/centeio/AT-Communication
H
Data from: TELE ECG Database: 250 telehealth ECG records (collected using...
dataverse.harvard.edu
Updated Sep 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond (2016). TELE ECG Database: 250 telehealth ECG records (collected using dry metal electrodes) with annotated QRS and artifact masks, and MATLAB code for the UNSW artifact detection and UNSW QRS detection algorithms [Dataset]. http://doi.org/10.7910/DVN/QTG0EP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QTG0EP
Dataset updated
Sep 6, 2016
Dataset provided by
Harvard Dataverse
Authors
Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
Australian Research Council
Description
------------------------------------------------------------------------------------------------------------- CITATION ------------------------------------------------------------------------------------------------------------- Please cite this data and code as: H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, "QRS detection algorithm for telehealth electrocardiogram recordings," IEEE Transaction in Biomedical Engineering, vol. 63(7), p. 1377-1388, 2016. ------------------------------------------------------------------------------------------------------------- DATABASE DESCRIPTION ------------------------------------------------------------------------------------------------------------- The following description of the TELE database is from Khamis et al (2016): "In Redmond et al (2012), 300 ECG single lead-I signals recorded in a telehealth environment are described. The data was recorded using the TeleMedCare Health Monitor (TeleMedCare Pty. Ltd. Sydney, Australia). This ECG is sampled at a rate of 500 Hz using dry metal Ag/AgCl plate electrodes which the patient holds with each hand; a reference electrode plate is also positioned under the pad of the right hand. Of the 300 recordings, 250 were selected randomly from 120 patients, and the remaining 50 were manually selected from 168 patients to obtain a larger representation of poor quality data. Three independent scorers annotated the data by identifying sections of artifact and QRS complexes. All scorers then annotated the signals as a group, to reconcile the individual annotations. Sections of the ECG signal which were less than 5 s in duration were considered to be part of the neighboring artifact sections and were subsequently masked. QRS annotations in the masked regions were discarded prior to the artifact mask and QRS locations being saved. Of the 300 telehealth ECG records in Redmond et al. (2012), 50 records (including 29 of the 250 randomly selected records and 21 of the 50 manually selected records) were discarded as all annotated RR intervals within these records overlap with the annotated artifact mask and therefore, no heart rate can be calculated, which is required for measuring algorithm performance. The remaining 250 records will be referred to as the TELE database." For all 250 recordings in the TELE database, the mains frequency was 50 Hz, the sampling frequency was 500 Hz and the top and bottom rail voltages were 5.556912223578890 and -5.554198887532222 mV respectively. ------------------------------------------------------------------------------------------------------------- DATA FILE DESCRIPTION ------------------------------------------------------------------------------------------------------------- Each record in the TELE database is stored as a X_Y.dat file where X indicates the index of the record in the TELE database (containing a total of 250 records) and Y indicates the index of the record in the original dataset containing 300 records (see Redmond et al. 2012). The .dat file is a comma separated values file. Each line contains: - the ECG sample value (mV) - a boolean indicating the locations of the annotated qrs complexes - a boolean indicating the visually determined mask - a boolean indicating the software determined mask (see Khamis et al. 2016) ------------------------------------------------------------------------------------------------------------- CONVERTING DATA TO MATLAB STRUCTURE ------------------------------------------------------------------------------------------------------------- A matlab function (readFromCSV_TELE.m) has been provided to read the .dat files into a matlab structure: %% % [DB,fm,fs,rail_mv] = readFromCSV_TELE(DATA_PATH) % % Extracts the data for each of the 250 telehealth ECG records of the TELE database [1] % and returns a structure containing all data, annotations and masks. % % IN: DATA_PATH - String. The path containing the .hdr and .dat files % % OUT: DB - 1xM Structure. Contains the extracted data from the M (250) data files. % The structure has fields: % * data_orig_ind - 1x1 double. The index of the data file in the original dataset of 300 records (see [1]) - for tracking purposes. % * ecg_mv - 1xN double. The ecg samples (mV). N is the number of samples for the data file. % * qrs_annotations - 1xN double. The qrs complexes - value of 1 where a qrs is located and 0 otherwise. % * visual_mask - 1xN double. The visually determined artifact mask - value of 1 where the data is masked and 0 otherwise. % * software_mask - 1xN double. The software artifact mask - value of 1 where the data is masked and 0 otherwise. % fm - 1x1 double. The mains frequency (Hz) % fs - 1x1 double. The sampling frequency (Hz) % rail_mv - 1x2 double. The bottom and top rail voltages (mV) % % If you use this code or data, please cite as follows: % % [1] H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, % "QRS detection algorithm...
o
University SET data, with faculty and courses characteristics
openicpsr.org
Updated Sep 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
Explore at:
Unique identifier
https://doi.org/10.3886/E149801V1
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
w
Total Public Records Requests by Department
data.wu.ac.at
csv, json, xml
Updated Jul 10, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City Clerk (2017). Total Public Records Requests by Department [Dataset]. https://data.wu.ac.at/schema/opendata_lasvegasnevada_gov/ODcyYS1xZmht
Explore at:
csv, json, xmlAvailable download formats
Dataset updated
Jul 10, 2017
Dataset provided by
City Clerk
Description
See "About" for field info. This dataset tracks the speed the city responds to public records requests.
z
Phishing and Benign Domain Dataset (DNS, IP, WHOIS/RDAP, TLS, GeoIP)
zenodo.org
json
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radek Hranický; Radek Hranický; Adam Horák; Jan Polišenský; Jan Polišenský; Petr Pouč; Ondřej Ondryáš; Adam Horák; Petr Pouč; Ondřej Ondryáš (2024). Phishing and Benign Domain Dataset (DNS, IP, WHOIS/RDAP, TLS, GeoIP) [Dataset]. http://doi.org/10.5281/zenodo.8364668
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8364668
Dataset updated
Apr 28, 2024
Dataset provided by
Zenodo
Authors
Radek Hranický; Radek Hranický; Adam Horák; Jan Polišenský; Jan Polišenský; Petr Pouč; Ondřej Ondryáš; Adam Horák; Petr Pouč; Ondřej Ondryáš
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS certificate fields, and GeoIP information for 432,572 verified benign domains from Cisco Umbrella and 36,993 verified phishing domains from PhishTank and OpenPhish services. The dataset is useful for statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing detection. The data was collected between March and July 2023.The final assessment of the data was conducted in July 2023 (this is why the names are suffixed with _2307).

The upload contains: a) data files, b) the description of the data structure, and c) the veature vector we used for ML-based phishing domain detection.

Data Files

The data is located in two individual files:

benign_2307.json - data about 432,572 benign domains, and

phishing_2307.json - data about 36,993 phishing domains.

Data Structure

Both files are in the JSON Array format. The structure is as follows:

[ { "_id" : "A unique ID of the data record", "domain_name" : "Name of the domain (e.g., zenodo.com)", "dns" : { "//": "Data obtained from DNS records" }, "evaluated_on" : "// ISO Timestamp of data collection ", "ip_data" : [ "// Data for each related IP adddress ", { "//": "IP-related data, including RTT from ICMP echo attempts (from Brno, Czechia)", "//": "WHOIS/RDAP data for the given IP address", "//": "GeoIP data for the given IP address", "//": "NERD system reputation score (if available)", "//": "ASN info", "//": "remarks: ISO timestamps of collection of the individual data pieces" }, ], "label" : "benign_2307 for benign OR misp_2307 for phishing", "rdap" : { "//": "WHOIS/RDAP information for the domain name" }, "remarks" : { "dns_evaluated_on" : "ISO Timestamp of DNS data collection", "rdap_evaluated_on" : "ISO Timestamp of WHOIS/RDAP data collection", "tls_evaluated_on" : "ISO Timestamp of TLS certificate information collection", "dns_had_no_ips" : "true if no IPs were found in DNS records" }, "sourced_on" : "ISO Timestamp of the moment the domain was found", "tls" : { "cipher" : "Identifier of the TLS cipher suite", "count" : "Number of certificates in chain", "protocol" : "Version of the TLS protocol", "certificates" : [ "//": "Information from TLS certificate fields: issuer, extensions, etc." ] }, "category" : "Category of the record (could be ignored)", "source" : "Name of the file that we used to save the domain list" } ]

Feature Vector

This section describes the veature vector used in the "Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence" paper that was accepted to the IEEE NOMS 2024 conference.

Lexical Features

The following features were extracted from the sole domain name:

lex_name_len - length of the domain name,

lex_begins_with_digit - true if the domain name begins with a digit,

lex_www_flag - true if the domain name begins with "www.",

lex_phishing_keyword_count - occurence count of 47 phishing-related keywords,

lex_consecutive_chars - length of the longest consecutive character sequence,

lex_tld_len - length of the top-level domain (TLD),

lex_tld_hash - hash of the TLD,

lex_sld_len - length of the second-level domain (SLD),

lex_sld_norm_entropy - normalized entropy of the SLD,

lex_stld_unique_char_count - number of unique characters in the TLD and the SLD,

lex_sub_count - number of subdomains,

lex_sub_digit_ratio - ratio of digits in subdomains,

lex_sub_hex_ratio - ratio of hex symbols in subdomains,

lex_sub_non_alpanum_ratio - ratio of non-alphanumeric symbols in subdomains,

lex_sub_vowel_ratio - ratio of vowels in subdomains,

lex_sub_consonant_ratio - ratio of consonants in subdomains,

lex_sub_max_consonant_len - length of the longest consonant sequence in subdomains,

lex_sub_norm_entropy - normalized entropy of a string made from all subdomains,

lex_phishing_bigram_matches - occurrence count of the top 300 phishing domain bigrams,

lex_phishing_trigram_matches - occurrence count of the top 2000 phishing domain trigrams,

lex_phishing_tetragram_matches - occurrence count of the top 5000 phishing domain tetragrams,

lex_phishing_pentagram_matches - occurrence count of the top 10000 phishing domain pentagrams.

DNS-based Features

The following features were extracted from DNS responses when querying about the domain:

dns_A_count - number of A records for the domain,

dns_AAAA_count - number of AAAA records for the domain,

dns_CNAME_count - number of CNAME records for the domain,

dns_MX_count - number of MX records for the domain,

dns_NS_count - number of nameserver (NS) records for the domain,

dns_TXT_count - number of TXT records for the domain,

dns_soa_primary_ns_len - number of characters in the primary NS's domain name,

dns_soa_primary_ns_level - number of subdomain in the primary NS's domain name,

dns_soa_primary_ns_digit_count - number of digits in the primary NS's domain name,

dns_soa_primary_ns_entropy - normalized entropy of the primary NS's domain name,

dns_soa_email_len - number of characters in the admin's email domain name part,

dns_soa_email_level - number of subdomains in the admin's email domain name part,

dns_soa_email_digit_count - number of digits in the admin's email domain name part,

dns_soa_email_entropy - normalized entropy of the admin's email domain name part,

dns_soa_refresh - SOA refresh parameter,

dns_soa_retry - SOA retry parameter,

dns_soa_expire - SOA expire parameter,

dns_mx_avg_len - average number of characters of the domain names in MX records,

dns_mx_avg_entropy - average normalized entropy of the domain names in MX records,

dns_domain_name_in_mx - true if the domain name is contained in the MX record's domains,

dns_txt_spf_exists - true if an SPF record is in the TXT RRs,

dns_txt_avg_entropy - average normalized entropy of the TXT records

dns_ttl_low - number of RRsets with TTL in [0,100],

dns_ttl_mid - number of RRsets with TTL in [101,500],

dns_zone_entropy - normalized entropy of the zone's domain name.

IP-based Features

These features were derived from IP addresses and ICMP echo replies:

ip_mean_average_rtt - average RTT of all ICMP echo attempts,

ip_entropy - total entropy of all /16 (/64 for v6) IP prefixes,

ip_count - total number of IP addresses for the domain,

ip_v4_count - total number of IPv4 addresses for the domain,

ip_v6_count - total number of IPv6 addresses for the domain,

TLS-based Features

The following features were extracted from TLS certificate chains and TLS handshakes:

tls_chain_len - length of the TLS certificate chain,

tls_broken_chain - true if there is a certificate that has never been valid,

tls_expired_chain - true if there is an expired certificate in the chain,

tls_total_extension_count - total extensions in all certificates in the chain,

tls_critical_extensions - total extensions flagged as "critical" in all certificates,

tls_with_policies_crt_count - number of certificates that include the "policies" extension,

tls_percentage_crt_with_policies - percentage of certificates that include the "policies" extension,

tls_x509_anypolicy_crt_count - number of certificates not enforcing any security policy,

tls_iso_policy_crt_count - total discovered policies from the 1.* OID space,

tls_joint_isoitu_policy_crt_count - total discovered policies from from the 2.* OID space,

tls_subject_count - number of subject alternative names (SANs) in the leaf certificate,

tls_server_auth_crt_count - number of certificates with the "Web Server Authentication",

tls_client_auth_crt_count - number of certificates with the "Web Client Authentication",

tls_CA_certs_in_chain_ratio - ratio of CA certificates in the chain,

tls_unique_SLD_count -number of unique second-level domains
EMRBots: a 10,000-patient database
figshare.com
zip
Updated Sep 3, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uri Kartoun (2018). EMRBots: a 10,000-patient database [Dataset]. http://doi.org/10.6084/m9.figshare.7040060.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7040060.v3
Dataset updated
Sep 3, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Uri Kartoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A 10,000-patient database that contains in total 10,000 virtual patients, 36,143 admissions, and 10,726,505 lab observations.
w
Dataset of books called Count the ways : the greatest love stories of our...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Count the ways : the greatest love stories of our time [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Count+the+ways+%3A+the+greatest+love+stories+of+our+time
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Count the ways : the greatest love stories of our time. It features 7 columns including author, publication date, language, and book publisher.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Asset Row Count Checker [Dataset]. https://internal.chattadata.org/w/4qjh-fdm6/default?cur=s-SPMLP_e2T

Asset Row Count Checker

Explore at:

application/rdfxml, csv, tsv, json, application/rssxml, xmlAvailable download formats

Dataset updated

Jul 6, 2025

Description

This dataset tracks the number of days since the row count on a dataset asset has changed. It's purpose is to ensure datasets are updating as expected. This dataset is identical to the Socrata Asset Inventory with added Checkpoint Date and Days Since Row Count Change attributes.

Clear search

Close search

Google apps

Main menu

Asset Row Count Checker

Hottest Kaggle Datasets

Context

Content

Acknowledgements

Inspiration

Dataset of book subjects that contain Count Draco down under

Conceptual novelty scores for PubMed articles

PSYCHE-D: predicting change in depression severity using person-generated...

Synthetic EHR Dataset (10 % demo, 7 languages)

Dwelling Unit Completion Counts by Building Permit

tissue_cell_raw_p_dataset_new

Cambridge Homeless Point-in-Time Count data: 2012-2024

The Dynamics of Collective Action Corpus

OMI/Aura NO2 Tropospheric, Stratospheric & Total Columns MINDS Daily L3...

Dataset of total students of universities in Chile

Ulta Skincare Product Review Data

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Dataset of User Study "Artificial Trust Communication in a 2D grid-world...

Data from: TELE ECG Database: 250 telehealth ECG records (collected using...

University SET data, with faculty and courses characteristics

Total Public Records Requests by Department

Phishing and Benign Domain Dataset (DNS, IP, WHOIS/RDAP, TLS, GeoIP)

Data Files

Data Structure

Feature Vector

Lexical Features

DNS-based Features

IP-based Features

TLS-based Features

EMRBots: a 10,000-patient database

Dataset of books called Count the ways : the greatest love stories of our...

Asset Row Count Checker