81 datasets found

o
C Street Cross Street Data in Reading, MA
ownerly.com
Updated Dec 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ownerly (2021). C Street Cross Street Data in Reading, MA [Dataset]. https://www.ownerly.com/ma/reading/c-st-home-details
Explore at:
Dataset updated
Dec 6, 2021
Dataset authored and provided by
Ownerly
Area covered
Reading, Massachusetts
Description
This dataset provides information about the number of properties, residents, and average property values for C Street cross streets in Reading, MA.
h
race-c
huggingface.co
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tasksource (2023). race-c [Dataset]. https://huggingface.co/datasets/tasksource/race-c
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2023
Dataset authored and provided by
tasksource
Description
Race-C : additional data for race (high school/middle school) but for college level https://github.com/mrcdata/race-c @InProceedings{pmlr-v101-liang19a, title={A New Multi-choice Reading Comprehension Dataset for Curriculum Learning}, author={Liang, Yichan and Li, Jianheng and Yin, Jian}, booktitle={Proceedings of The Eleventh Asian Conference on Machine Learning}, pages={742--757}, year={2019} }
NNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute,...
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). NNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute, Probable [Dataset]. https://catalog.data.gov/dataset/nndss-table-1q-hepatitis-b-perinatal-infection-to-hepatitis-c-acute-probable-2ee3e
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
NNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute, Probable - 2022. In this Table, provisional cases* of notifiable diseases are displayed for United States, U.S. territories, and Non-U.S. residents. Notes: • These are weekly cases of selected infectious national notifiable diseases, from the National Notifiable Diseases Surveillance System (NNDSS). NNDSS data reported by the 50 states, New York City, the District of Columbia, and the U.S. territories are collated and published weekly as numbered tables available at https://www.cdc.gov/nndss/data-statistics/index.html. Cases reported by state health departments to CDC for weekly publication are subject to ongoing revision of information and delayed reporting. Therefore, numbers listed in later weeks may reflect changes made to these counts as additional information becomes available. Case counts in the tables are presented as published each week. See also Guide to Interpreting Provisional and Finalized NNDSS Data at https://www.cdc.gov/nndss/docs/Readers-Guide-WONDER-Tables-20210421-508.pdf. • Notices, errata, and other notes are available in the Notice To Data Users page at https://wonder.cdc.gov/nndss/NTR.html. • The list of national notifiable infectious diseases and conditions and their national surveillance case definitions are available at https://ndc.services.cdc.gov/. This list incorporates the Council of State and Territorial Epidemiologists (CSTE) position statements approved by CSTE for national surveillance. Footnotes: *Case counts for reporting years 2021 and 2022 are provisional and subject to change. Cases are assigned to the reporting jurisdiction submitting the case to NNDSS, if the case's country of usual residence is the U.S., a U.S. territory, unknown, or null (i.e. country not reported); otherwise, the case is assigned to the 'Non-U.S. Residents' category. Country of usual residence is currently not reported by all jurisdictions or for all conditions. For further information on interpretation of these data, see https://www.cdc.gov/nndss/docs/Readers-Guide-WONDER-Tables-20210421-508.pdf. †Previous 52 week maximum and cumulative YTD are determined from periods of time when the condition was reportable in the jurisdiction (i.e., may be less than 52 weeks of data or incomplete YTD data). U: Unavailable — The reporting jurisdiction was unable to send the data to CDC or CDC was unable to process the data. -: No reported cases — The reporting jurisdiction did not submit any cases to CDC. N: Not reportable — The disease or condition was not reportable by law, statute, or regulation in the reporting jurisdiction. NN: Not nationally notifiable — This condition was not designated as being nationally notifiable. NP: Nationally notifiable but not published. NC: Not calculated — There is insufficient data available to support the calculation of this statistic. Cum: Cumulative year-to-date counts. Max: Maximum — Maximum case count during the previous 52 weeks.

Dataset of psychophysiological data from children with learning difficulties...

openneuro.org

Updated May 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

César E. Corona-González; Claudia Rebeca De Stefano-Ramos; Juan Pablo Rosado-Aíza; David I. Ibarra-Zarate; Fabiola R. Gómez-Velázquez; Luz María Alonso-Valerdi (2025). Dataset of psychophysiological data from children with learning difficulties who strengthen reading and math skills through assistive technology [Dataset]. http://doi.org/10.18112/openneuro.ds006260.v1.0.0

Explore at:

Unique identifier

https://doi.org/10.18112/openneuro.ds006260.v1.0.0

Dataset updated

May 26, 2025

Dataset provided by

OpenNeurohttps://openneuro.org/

Authors

César E. Corona-González; Claudia Rebeca De Stefano-Ramos; Juan Pablo Rosado-Aíza; David I. Ibarra-Zarate; Fabiola R. Gómez-Velázquez; Luz María Alonso-Valerdi

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

README

Authors

César E. Corona-González, Claudia Rebeca De Stefano-Ramos, Juan Pablo Rosado-Aíza, Fabiola R Gómez-Velázquez, David I. Ibarra-Zarate, Luz María Alonso-Valerdi

Contact person

César E. Corona-González

https://orcid.org/0000-0002-7680-2953

a00833959@tec.mx

Project name

Psychophysiological data from Mexican children with learning difficulties who strengthen reading and math skills by assistive technology

Year that the project ran

2023

Brief overview of the tasks in the experiment

The current dataset consists of psychometric and electrophysiological data from children with reading or math learning difficulties. These data were collected to evaluate improvements in reading or math skills resulting from using an online learning method called Smartick.

The psychometric evaluations from children with reading difficulties encompassed: spelling tests, where 1) orthographic and 2) phonological errors were considered, 3) reading speed, expressed in words read per minute, and 4) reading comprehension, where multiple-choice questions were given to the children. The last 2 parameters were determined according to the standards from the Ministry of Public Education (Secretaría de Educación Pública in Spanish) in Mexico. On the other hand, group 2 assessments embraced: 1) an assessment of general mathematical knowledge, as well as 2) the hits percentage, and 3) reaction time from an arithmetical task. Additionally, selective attention and intelligence quotient (IQ) were also evaluated.

Then, individuals underwent an EEG experimental paradigm where two conditions were recorded: 1) a 3-minute eyes-open resting state and 2) performing either reading or mathematical activities. EEG recordings from the reading experiment consisted of reading a text aloud and then answering questions about the text. Alternatively, EEG recordings from the math experiment involved the solution of two blocks with 20 arithmetic operations (addition and subtraction). Subsequently, each child was randomly subcategorized as 1) the experimental group, who were asked to engage with Smartick for three months, and 2) the control group, who were not involved with the intervention. Once the 3-month period was over, every child was reassessed as described before.

Description of the contents of the dataset

The dataset contains a total of 76 subjects (sub-), where two study groups were assessed: 1) reading difficulties (R) and 2) math difficulties (M). Then, each individual was subcategorized as experimental subgroup (e), where children were compromised to engage with Smartick, or control subgroup (c), where they did not get involved with any intervention.

Every subject was followed up on for three months. During this period, each subject underwent two EEG sessions, representing the PRE-intervention (ses-1) and the POST-intervention (ses-2).

The EEG recordings from the reading difficulties group consisted of a resting state condition (run-1) and while performing active reading and reading comprehension activities (run-2). On the other hand, EEG data from the math difficulties group was collected from a resting state condition (run-1) and when solving two blocks of 20 arithmetic operations (run-2 and run-3). All EEG files were stored in .set format. The nomenclature and description from filenames are shown below:

Nomenclature	Description
sub-	Subject
M	Math group
R	Reading group
c	Control subgroup
e	Experimental subgroup
ses-1	PRE-intervention
ses-2	POST-Intervention
run-1	EEG for baseline
run-2	EEG for reading activity, or the first block of math
run-3	EEG for the second block of math

Example: the file sub-Rc11_ses-1_task-SmartickDataset_run-2_eeg.set is related to: - The 11th subject from the reading difficulties group, control subgroup (sub-Rc11). - EEG recording from the PRE-intervention (ses-1) while performing the reading activity (run-2)

Independent variables

Study groups:
- Reading difficulties
  - Control: children did not follow any intervention
  - Experimental: Children used the reading program of Smartick for 3 months
- Math difficulties
  - Control: children did not follow any intervention
  - Experimental: Children used the math program of Smartick for 3 months
Condition:
- PRE-intervention: first psychological and electroencephalographic evaluation
- POST-intervention: second psychological and electroencephalographic evaluation

Dependent variables

Psychometric data from the reading difficulties group:
- Orthographic_ERR: number of orthographic errors.
- Phonological_ERR: number of phonological errors.
- Selective_Attention: score from the selective attention test.
- Reading_Speed: reading speed in words per minute.
- Comprehension: score on a reading comprehension task.
- GROUP: C for the control group, E for the experimental group.
- GENDER: M for male, F for Female.
- AGE: age at the beginning of the study.
- IQ: intelligence quotient.
Psychometric data from the math difficulties group:
- WRAT4: score from the WRAT-4 test.
- hits: hits during the EEG acquisition [%].
- RT: reaction time during the EEG acquisition [s].
- Selective_Attention: score from the selective attention test.
- GROUP: C for the control Group, E for the experimental group.
- GENDER: M for male, F for female.
- AGE: age at the beginning of the study.
- IQ: intelligence quotient.

Psychometric data can be found in the 01_Psychometric_Data.xlsx file

Engagement percentage within Smartick (only for experimental group)
- These values represent the engagement percentage through Smartick.
- Students were asked to get involved with the online method for learning for 3 months, 5 days a week.
- Greater values than 100% denote participants who regularly logged in more than 5 days weekly.

Engagement percentage be found in the 05_SessionEngagement.xlsx file

Methods

Subjects

Seventy-six Mexican children between 7 and 13 years old were enrolled in this study.

Information about the recruitment procedure

The sample was recruited through non-profit foundations that support learning and foster care programs.

Apparatus

g.USBamp RESEARCH amplifier

Initial setup

Explain the task to the participant.
Sign informed consent.
Set up electrodes.

Task details

The stimuli nested folder contains all stimuli employed in the EEG experiments.

Level 1 - Math: Images used in the math experiment. - Reading: Images used in the reading experiment.

Level 2 - Math * POST_Operations: arithmetic operations from the POST-intervention.
* PRE_Operations: arithmetic operations from the PRE-intervention. - Reading * POST_Reading1: text 1 and text-related comprehension questions from the POST-intervention. * POST_Reading2: text 2 and text-related comprehension questions from the POST-intervention. * POST_Reading3: text 3 and text-related comprehension questions from the POST-intervention. * PRE_Reading1: text 1 and text-related comprehension questions from the PRE-intervention. * PRE_Reading2: text 2 and text-related comprehension questions from the PRE-intervention. * PRE_Reading3: text 3 and text-related comprehension questions from the PRE-intervention.

Level 3 - Math * Operation01.jpg to Operation20.jpg: arithmetical operations solved during the first block of the math

MIEDT dataset
kaggle.com
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
机关鸢鸟 (2025). MIEDT dataset [Dataset]. https://www.kaggle.com/datasets/lidang78/miedt-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
机关鸢鸟
Description
Dataset Overview This dataset is organized based on the edge detection task, aiming to provide rich image resources and corresponding edge detection annotation information for related research and applications, which can be used for the testing of edge detection algorithms. In order to evaluate the performance of the edge detection method comprehensively, we created the Medical Image Edge Detection Test (MIEDT) dataset. The MIEDT contains 100 medical images, which were randomly selected from three publicly available datasets, Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 .

Data Set Structure Original image: This folder stores the original image data. It contains 15 Head CT images in PNG format with varying image resolutions; 25 coronary heart disease images in JPG format and with an image resolution of [1024 * 1024]; 60 skin images in JPG format and with an image resolution of [600 * 450]. It covers a variety of medical image materials with different imaging and contrast, providing diverse input data for edge detection algorithms. Ground truth：The data in this folder are the edge detection annotation images corresponding to the images in the "Originals" folder. They are in PNG format. In these images, the white pixels represent the edge parts of the image, and the black pixels represent the non-edge areas. These annotation information accurately outlines the object contours and edge features in the original images.

Usage Instructions For users who conduct image processing using Python, they can utilize the cv2 (OpenCV) library to read image data. The sample code is as follows:

import cv2 original_image = cv2.imread('Original image/IMG-001.png') # Read original image ground_truth_image = cv2.imread('Ground truth/GT-001.png', cv2.IMREAD_GRAYSCALE) # Read the corresponding Ground Truth image When performing model training based on deep learning frameworks (such as TensorFlow, PyTorch), the dataset path can be configured into the corresponding dataset loading class according to the data loading mechanism of the framework to ensure that the model can correctly read and process the image and its annotation data.

4. Data Sources and References Data Sources: The original images are collected from public image datasets Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 to ensure the quality and diversity of the images. If you are using this dataset in academic research, please cite the following literature.

References: [1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368

[2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).

[3] Classification of Brain Hemorrhage Using Deep Learning from CT Scan Images - https://link.springer.com/chapter/10.1007/978-981-19-7528-8_15
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
D
Data from: Eyelit: Eye-movement and reader response data during literary...
ssh.datastations.nl
bin, csv, pdf, txt +1
Updated Apr 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H.M.L. Mak; R.M. Willems; H.M.L. Mak; R.M. Willems (2021). Eyelit: Eye-movement and reader response data during literary reading [Dataset]. http://doi.org/10.17026/DANS-ZQK-ZMQS
Explore at:
bin(30157357), bin(31965410), csv(661645813), csv(534378046), bin(21852492), bin(4502105), bin(36072658), bin(6438396), bin(5628866), csv(620019005), bin(26685442), bin(6444065), bin(5235749), csv(582059078), csv(558011683), bin(34586377), bin(29959230), bin(32012264), bin(7580127), bin(5986698), bin(7745282), bin(33331478), csv(493211098), bin(29623507), bin(5559231), bin(31109110), csv(975214371), csv(476451087), csv(381470397), bin(7362825), bin(37957712), bin(27087358), bin(36150833), csv(557964373), bin(5724800), csv(560554131), csv(417171072), bin(20552580), bin(6025729), csv(311565352), bin(36965503), csv(523685408), bin(38030296), bin(6537542), csv(887262542), csv(337356627), bin(31287101), bin(5573348), csv(937760515), bin(6957824), bin(8388037), bin(5208994), bin(7730242), bin(32265679), csv(336945778), csv(620653236), bin(32899627), csv(508088266), csv(472599133), bin(6359541), csv(407051493), csv(598608804), bin(29185561), bin(45156247), csv(398844149), bin(5020160), csv(629346736), bin(25539995), bin(5550796), csv(399044022), bin(5765353), bin(43060760), bin(29147612), csv(668032458), bin(29304292), bin(32231528), bin(25295616), bin(7162574), bin(5435132), bin(26484997), bin(22212757), bin(18428148), csv(676287894), bin(4648796), bin(5069298), txt(4136), bin(8255879), bin(6108715), bin(41236352), csv(494047986), bin(27468730), bin(32148631), csv(552716109), csv(485622707), csv(651519667), csv(746362972), bin(5150091), bin(37449812), bin(28037343), bin(7562265), csv(1313476), bin(6038727), bin(40562660), bin(7178983), bin(32505356), csv(516571336), bin(33839924), bin(5055250), csv(463041081), csv(587948161), csv(287923005), bin(6368971), csv(465643357), bin(28636076), bin(8377434), bin(6590582), csv(479984858), bin(7102333), bin(29298584), csv(454339138), bin(18976997), bin(28733575), bin(6016248), csv(537271195), bin(6690776), bin(27314972), bin(6284906), bin(6230783), csv(448534064), csv(485388875), bin(29701894), csv(539173062), csv(477503906), bin(5599126), bin(6058325), bin(5424112), bin(6865547), zip(2837783258), bin(6286666), csv(518094913), bin(21021261), csv(694955728), bin(39710487), csv(531033451), csv(535099602), csv(536881778), bin(29824338), bin(5783867), bin(5882019), csv(406745528), csv(437155907), bin(34651085), pdf(537380), csv(490987361), bin(25362131), csv(6987153), bin(5579351), zip(2549577526), bin(5473882), bin(4231006), csv(295344872), csv(443107907), bin(7569531), bin(9213419), bin(4232167), csv(421297559), bin(42163956), bin(26222501), bin(5842103), csv(467577761), bin(6072537), bin(32117269), bin(7132559), bin(26470082), bin(44055149), csv(811967141), bin(7034677), csv(471346879), bin(45693428), bin(27179659), bin(5924264), bin(26328879), bin(5931072), bin(40428983), bin(5438000), bin(28259466), csv(30566), csv(348314914), csv(386064866), bin(33513997), bin(35624747), bin(30684098), bin(29834398), csv(438255721), csv(464154188), csv(320023323), bin(5692140), csv(278847035), bin(34494086), bin(5320934), csv(402807388), bin(28950931), bin(27609042), bin(4764609), csv(414860896), zip(212176), bin(33605328), bin(6163958), csv(637291005), csv(515217863), bin(4508627), csv(436893685), bin(6651970), bin(6434543), bin(27096558), bin(6807296), bin(34903947), csv(707602696), bin(5442999), csv(431035399), bin(5521864), csv(647825365), csv(704813768), csv(489905757), bin(4097192), bin(28230149), bin(4267613), bin(30010180), csv(486977215), bin(36204093), csv(542873726), bin(18987834), bin(26230740), bin(6152216), bin(6186677), bin(41956282), bin(5710174), pdf(402836), csv(334976288), bin(6335420), bin(7119333), bin(34584069), bin(20401056), bin(40857083), csv(728464474), bin(23625718), csv(589202194), csv(289904519), bin(5577327), csv(459359294), bin(31703368), bin(5594852), bin(41149057), bin(6815634), bin(35021788), csv(308630183), bin(38686106), bin(7696235), csv(302665339), csv(476178484), bin(39231573), bin(27017657), bin(44888640), bin(4949915), bin(39025691), bin(7157013), csv(443097454), bin(5545932), csv(650230047), bin(5754988), csv(680826693), bin(40380607), bin(33966350), csv(445692883), csv(487795925), bin(7338739), csv(550689203), bin(32361671), bin(26699842), bin(5775160), csv(657791287), bin(4659091), bin(4687655), bin(8425219), bin(39261172), csv(602013475), bin(28042858), bin(5912866), bin(6323363), bin(5604972), csv(10419), bin(8113657), csv(427000151), bin(5448739), bin(25318570), bin(7409279), bin(32421856), csv(367575657), bin(5259965), csv(554593399), bin(25642860), csv(680205347), bin(54857911), csv(420855177), bin(5360802), bin(25829442), csv(538810532), csv(414197947), bin(6688737), bin(28959215), bin(6466366), csv(531517808), csv(571164214), bin(28355444), csv(308108571), csv(639635211), bin(5374524), csv(498142013), bin(22777707)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-ZQK-ZMQS
Dataset updated
Apr 6, 2021
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
H.M.L. Mak; R.M. Willems; H.M.L. Mak; R.M. Willems
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is an extensive eye tracking dataset of 102 participants reading three Dutch literary short stories each (7790 words per participant). The preprocessed data set includes (1) Fixation report (fixation-level), (2) Saccade report, (3) Interest Area report (word-level), (4) Trial report (aggregated data for each page; stories were split up into 30 pages each), (5) Sample report (the data were sampled at 500 Hz, this report includes data on all individual samples), (6) Questionnaire data on reading experiences and other participant characteristics, and (7) word characteristics for all words in the stories (with the potential of calculating additional word characteristics).The study for which this data set was collected explored the effect of simulation on reading behavior by means of eye tracking. We hypothesized (A) that simulation would lead to longer fixation times for parts of the text high in simulation-eliciting content. Additionally, we hypothesized (B) that we would find personal preferences in the reaction to different types of simulation-eliciting content. We expected (C) that the findings from the eye tracking data would be related to self-report of simulation. Finally, we expected (D) that the amount of simulation would be predictive of self-report of appreciation. We found (A) longer reading times for perceptual and mental event simulation, but shorter reading times for motor simulation. The strength of the relationship between simulation and reading times varied between participants, but was positively correlated across the different types of simulation-eliciting content (B). Regarding (C) and (D), we found that this variation in the strength of the relationship between simulation and reading times was indeed related to aspects of self-reported simulation, absorption and appreciation.The findings from this study are described in detail in Mak & Willems (2019). https://doi.org/10.1080/23273798.2018.1552007
f
Dataset: What the Eyes Reveal about (Reading) Poetry
datasetcatalog.nlm.nih.gov
figshare.com
Updated Dec 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wallot, Sebastian; Menninghaus, Winfried (2020). Dataset: What the Eyes Reveal about (Reading) Poetry [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000555746
Explore at:
Dataset updated
Dec 16, 2020
Authors
Wallot, Sebastian; Menninghaus, Winfried
Description
dataPOEM.csv The dataPOEM.csv data set contains data on the level of each poem. scoresAes = factor scores of moving, beauty, and melodious ratings. participant = participant number poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) poemIdentity = poem number avgWFreq = average word frequency of poem totalGazeSlopeLineLength totalGazeWordMeanNAByWordLen totalGazeWordMeanNADiff order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) firstFixDurMS_MINFIX_AVG = first fixation duration totalGazeMS_MINFIX_AVG = total gaze durations fixDurMS_MINFIX_NUM = number of fixations sacLenMS_MINFIX_AVG = average saccade length percRegMS_MINFIX_AVG = percentage of regressive eye movements pupilDial_AVG = average pupil dilation blink_NUM_TotalRT = number of blinks relative to total reading time totalReadingTime = total reading time of the poem areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem moving = rating of how moving the poem was beauty = rating of how beautiful the poem was melodious = rating of how melodious the poem was dataROI.csv The dataROI.csv data set contains data on the level of each line within a poem. order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) participant = participant number poemIdentity = poem number lineNr = line number within poem poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza) BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme) lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem) totalGazeByWordNA = total gaze duration of final word of a line relative to word length gazeByLineLengthNA = total gaze duration of a line relative to line length dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem
d
August 2025 data-update for "Updated science-wide author databases of...
elsevier.digitalcommonsdata.com
Updated Sep 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John P.A. Ioannidis (2025). August 2025 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.8
Explore at:
Unique identifier
https://doi.org/10.17632/btchxktzyw.8
Dataset updated
Sep 19, 2025
Authors
John P.A. Ioannidis
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2024 and single recent year data pertain to citations received during calendar year 2024. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2025 snapshot from Scopus, updated to end of citation year 2024. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2025. If an author is not on the list, it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
d
Input-output power spectral densities for three C-band EDFAs and four...
data.dtu.dk
txt
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Metodi Plamenov Yankov; Francesco Da Ros (2023). Input-output power spectral densities for three C-band EDFAs and four multi-span inline EDFAd fiber optic systems of different lengths [Dataset]. http://doi.org/10.11583/DTU.13135754.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.11583/DTU.13135754.v1
Dataset updated
Jul 12, 2023
Dataset provided by
Technical University of Denmark
Authors
Metodi Plamenov Yankov; Francesco Da Ros
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scripts to read the data into Matlab are available here:https://github.com/myankov/EDFA-data-reading-scripts/The datasets contains metadata, such as:1) unique ID of PSD profile2) unique ID of physical amplifier3) Total input and output power recordings of each EDFA4) unique ID of the multi-span system, indicating the order of the EDFAs, as well as the fiber span lengthsAs well as PSDs readings from an OSA for the input PSD and the output PSD at the OSA wavelengths.
Leash-Bio-processed-dataset
kaggle.com
Updated May 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hengck23 (2024). Leash-Bio-processed-dataset [Dataset]. https://www.kaggle.com/datasets/hengck23/leash-bio-processed-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
hengck23
Description
Processed dataset for https://www.kaggle.com/competitions/leash-BELKA.

For any b2z file, It is recommend to be parallel bzip decompressor (https://github.com/mxmlnkn/indexed_bzip2) for speed.

Last update : 22-may-2024

In summary:

See forum discussion for details of [1],[2]: https://www.kaggle.com/competitions/leash-BELKA/discussion/492846

[1] reduced data

train.reduced.parquet : 98_415_610 training SMILES and their information

train.bind.npz : 98_415_610 x 3 target matrix

test.reduced.parquet : 878_022 test SMILES

all_buildingblock.csv: building blocks id used in train.reduced.parquet/test.reduced.parquet

fold0.parquet: train_share,valid_share,valid_nonshare splits for the experiments in the discussion

[2] extracted ECFP4 fingerprints

train.ecfp4.packed.npz : Features extracted using rdkit

AllChem.GetMorganFingerprintAsBitVect(mol, 2, 2048)

repack with np.packbits() to give 98_415_610 x 256 feature matrix

test.ecfp4.packed.npz : similarly processed for the test SMILES

This is somehow obsolete as the competition progresses. ecfp6 gives better results and can be extracted fast with scikit-fingerprints.

See forum discussion for details of [3]: https://www.kaggle.com/competitions/leash-BELKA/discussion/498858 https://www.kaggle.com/code/hengck23/lb6-02-graph-nn-example

[3] graph NN processed data

test/train-replace-c.smiles.bytestring.bz2 : replace linker [Dy] with C. Note that these are bytestrings and not strings.

train-replace-c-30m.graph.pickle.**.b2z : 98_415_610 molecule graph split into 3 files. test graphs are not provided as they are be generated on the fly.

See forum discussion for details of [4]: https://www.kaggle.com/competitions/leash-BELKA/discussion/505985 https://www.kaggle.com/code/hengck23/conforge-open-source-conformer-generator

[4] conformer. i.e. molecule estimated xyz data

test-replace-c.conforge.sdf.bz2 : conformer in sdf file. you can read the file using rdkit Chem.SDMolSupplier().

test-replace-c.conforge.status.parquet:

'status col' shows the status of conformer. 0 means success. for failure cases, sdf store a dummy 'CC' molecule.

'idx col' shows the idx (primary key) to test.reduced.parquet. use this to retrieve SMILES strings. Note that conformer is based on test-replace-c.smiles.bytestring.bz2, i.e. [Dy] is replaced by C.

train-replace-c.sub-[split].conforge.sdf.bz2/status.parquet: smiliar format as describe above. [split] are:

train: 1000250+(1001610*3) molecules

valid: 40000

nonshare: about 61674
d
Patent AT-E401025-T1: [Translated] DEVICE FOR PREPARING A DRINK FROM A...
catalog.data.gov
data.virginia.gov
Updated Sep 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for Biotechnology Information (NCBI) (2025). Patent AT-E401025-T1: [Translated] DEVICE FOR PREPARING A DRINK FROM A CARTRIDGE, WITH ACTIVATION AFTER READING AN OPTICAL CODE ON THE CARTRIDGE [Dataset]. https://catalog.data.gov/dataset/patent-at-e401025-t1-translated-device-for-preparing-a-drink-from-a-cartridge-with-activat
Explore at:
Dataset updated
Sep 8, 2025
Dataset provided by
National Center for Biotechnology Information (NCBI)
Description
In an apparatus for the preparation of a plurality of drinks from cartridges (K) that are provided with an optical code (C) on one of its faces (F) identifying the cartridge (K) itself and the corresponding drink, the reading of the optical code (C) is made more certain and reliable thanks to a projecting reading window (40).
f
Supplemental Material for Morton et al., 2023
gsajournals.figshare.com
datasetcatalog.nlm.nih.gov
application/gzip
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth A. Morton; Ashley N. Hall; Josh T. Cuperus; Christine Queitsch (2023). Supplemental Material for Morton et al., 2023 [Dataset]. http://doi.org/10.25386/genetics.22197457.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.25386/genetics.22197457.v1
Dataset updated
Mar 7, 2023
Dataset provided by
GSA Journals
Authors
Elizabeth A. Morton; Ashley N. Hall; Josh T. Cuperus; Christine Queitsch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Whole genome sequencing was performed on C. elegans strains with different rDNA copy numbers.

CB3740_eDp20_WS235_sort.bam – Aligned whole genome sequence data for C. elegans strain CB3740 (eDf24 I; eDp20 (I,II); mnT12 (IV,X))

CB3740_eDP20_WS235_sort_depth.txt – Read depth analysis file for whole genome sequencing of C. elegans strain CB3740 (eDf24 I; eDp20 (I,II); mnT12 (IV,X))

eDP20_chrI_13mil.txt – Read depth data for right arm of ChrI of C. elegans strain CB3740

N2_chrI_13mil.txt – Read depth data for right arm of ChrI of C. elegans wild type strain N2

SEA296_MY1_130E_chrI_merge_RG.bam – Aligned whole genome sequence data for C. elegans strain SEA296 (mIs13[myo-2p::GFP + pes-10p::GFP + F22B7.9p::GFP] I, catIR8[I:, N2>MY1]). Homozygous for 64-copy rDNA array.

SEA296_MY1_130E_merge.g.vcf – VCF for sequence variants in C. elegans strain SEA296 (mIs13[myo-2p::GFP + pes-10p::GFP + F22B7.9p::GFP] I, catIR8[I:, N2>MY1]). Homozygous for 64-copy rDNA array.

SEA300_duprm_RG.bam – Aligned whole genome sequence data for C. elegans strain SEA300 (catIR12[I:?-end , MY1>N2]). Homozygous for 417-copy rDNA array.

SEA300_duprm_RG.g.vcf – VCF for sequence variants in C. elegans strain SEA300 (catIR12[I:?-end , MY1>N2]). Homozygous for 417-copy rDNA array.

SEA302_S2_WS230_duprumRG.bam – Aligned whole genome sequence data for C. elegans strain SEA302 (catIR14[I:~13500000-end, JU775>N2]). Homozygous for 81-copy rDNA array.

SEA302_S2_WS230_duprmRG.g.vcf – VCF for sequence variants in C. elegans strain SEA302 (catIR14[I:~13500000-end, JU775>N2]). Homozygous for 81-copy rDNA array.

SEA305_S5_WS230_duprmRG.bam – Aligned whole genome sequence data for C. elegans strain SEA305 (catIR17[I:~3600000-end, MY16>N2]) Homozygous for 73-copy rDNA array.

SEA305_S5_WS230_duprmRG.g.vcf – VCF for sequence variants in C. elegans strain SEA305 (catIR17[I:~3600000-end, MY16>N2]) Homozygous for 73-copy rDNA array.
Kaggle Data Science Survey 2017-2021
kaggle.com
zip
Updated Nov 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrada (2021). Kaggle Data Science Survey 2017-2021 [Dataset]. https://www.kaggle.com/datasets/andradaolteanu/kaggle-data-science-survey-20172021/code
Explore at:
zip(18555433 bytes)Available download formats
Dataset updated
Nov 26, 2021
Authors
Andrada
Description
Context

I have created this dataset for an easier way to analyse the progression of answers from the respondents that are participating each year in the very famous Data Science Kaggle Survey.

The sources of the present data are: * 2017: https://www.kaggle.com/kaggle/kaggle-survey-2017 * 2018: https://www.kaggle.com/kaggle/kaggle-survey-2018 * 2019: https://www.kaggle.com/c/kaggle-survey-2019/data * 2020: https://www.kaggle.com/c/kaggle-survey-2020/data * 2021: https://www.kaggle.com/c/kaggle-survey-2021/data

Methodology

This dataset was created by manually aggregating each of the 5 tables mentioned above. The full methodology was as follows:

The 2021 table was took as refference, as it is the latest and most "up to date" in regards with the questions and the Data Science Industry overall evolution.

Each year in descending order was fully analysed one by one in order to find all questions (and answers) that were the same to the ones found in 2021.

As we go back in time, the questions lose their completeness more and more, so I would highly suggest analysing percentages on Year, rather than absolute numbers.

The aggregation was done manually, as the questions order, naming and types of answers differ from one year to another. Hence, the most accurate way (although not the most efficient), was to read, order and pick the questions with regards to the base table (which was the 2021 Survey).

Content

This dataset contains the following:

kaggle_survey_2017_2021.csv: the tabular dataset containing the aggregated data from 2017 to 2021.

style.css: a file that serves as custom styling for my notebook on this competition.

images folder: all images I have used for my notebook on this competition.

Note: Notebook can be found here.

Acknowledgements

Thank you so much to the Kaggle Team for hosting these surveys and sharing with us all the data, so we can take the pulse of the community each year.

Inspiration

The Kaggle Survey is reach in information as is, but what can you find by adding another layer of information - the year? Evolutions in time could be fascinating.

Data from: A consensus compound/bioactivity dataset for data-driven drug...

zenodo.org
data.niaid.nih.gov
+1more

zip

Updated May 13, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6320761

Dataset updated

May 13, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Information

The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

Structure and content of the dataset

**Dataset structure**
ChEMBL ID	PubChem ID	IUPHAR ID	Target	Activity type	Assay type	Unit	Mean C (0)	...	Mean PC (0)	...	Mean B (0)	...	Mean I (0)	...	Mean PD (0)	...	Activity check annotation	Ligand names	Canonical SMILES C	...	Structure check	Source

The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

Column content:

ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
Target: biological target of the molecule expressed as the HGNC gene symbol
Activity type: for example, pIC₅₀
Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
Unit: unit of bioactivity measurement
Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
- no comment: bioactivity values are within one log unit;
- check activity data: bioactivity values are not within one log unit;
- only one data point: only one value was available, no comparison and no range calculated;
- no activity value: no precise numeric activity value was available;
- no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
Ligand names: all unique names contained in the five source databases are listed
Canonical SMILES columns: Molecular structure of the compound from each database
Structure check: To denote matching or differing compound structures in different source databases
- match: molecule structures are the same between different sources;
- no match: the structures differ;
- 1 source: no structure comparison is possible, because the molecule comes from only one source database.
Source: From which databases the data come from

D
Police Department Stop Data
data.sfgov.org
s.cnmilf.com
+1more
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Police Department Stop Data [Dataset]. https://data.sfgov.org/widgets/ubqf-aqzw?mobile_redirect=true
Explore at:
xml, xlsx, csv, kml, application/geo+json, kmzAvailable download formats
Dataset updated
Oct 27, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
UPDATE 1/7/2025: On June 28th 2023, the San Francisco Police Department (SFPD) changed its Stops Data Collection System (SDCS). As a result of this change, record identifiers have changed from the Department of Justice (DOJ) identifier to an internal record numbering system (referred to as "LEA Record ID"). The data that SFPD uploads to the DOJ system will contain the internal record number which can be used for joins with the data available on DataSF.

A. SUMMARY The San Francisco Police Department (SFPD) Stop Data was designed to capture information to comply with the Racial and Identity Profiling Act (RIPA), or California Assembly Bill (AB)953. SFPD officers collect specific information on each stop, including elements of the stop, circumstances and the perceived identity characteristics of the individual(s) stopped. The information obtained by officers is reported to the California Department of Justice. This dataset includes data on stops starting on July 1st, 2018, which is when the data collection program went into effect. Read the detailed overview for this dataset here.

B. HOW THE DATASET IS CREATED By the end of each shift, officers enter all stop data into the Stop Data Collection System, which is automatically submitted to the California Department of Justice (CA DOJ). Once a quarter the Department receives a stops data file from CA DOJ. The SFPD conducts several transformations of this data to ensure privacy, accuracy and compliance with State law and regulation. For increased usability, text descriptions have also been added for several data fields which include numeric codes (including traffic, suspicion, citation, and custodial arrest offense codes, and actions taken as a result of a stop). See the data dictionaries below for explanations of all coded data fields. Read more about the data collection, and transformation, including geocoding and PII cleaning processes, in the detailed overview of this dataset.

C. UPDATE PROCESS Information is updated on a quarterly basis.

D. HOW TO USE THIS DATASET This dataset includes information about police stops that occurred, including some details about the person(s) stopped, and what happened during the stop. Each row is a person stopped with a record identifier for the stop and a unique identifier for the person. A single stop may involve multiple people and may produce more than one associated unique identifier for the same record identifier. A certain percentage of stops have stop information that can’t be geocoded. This may be due to errors in data input at the officer level (typos in entry or providing an address that doesn't exist). More often, this is due to officers providing a level of detail that isn't codable to a geographic coordinate - most often at the Airport (ie: Terminal 3, door 22.) In these cases, the location of the stops is coded as unknown.

E. DATA DICTIONARIES CJIS Offense Codes data look up table

Look up table for other coded data fields
GOCE Satellite Telemetry
kaggle.com
Updated Jul 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
astro_pat (2024). GOCE Satellite Telemetry [Dataset]. https://www.kaggle.com/datasets/patrickfleith/goce-satellite-telemetry
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2024
Dataset provided by
Kaggle
Authors
astro_pat
Description
Utilisation of this data is subject to European Space Agency's Earth Observation Terms and Conditions. Read T&C here

This is Dataset Version 3 - Updates may be done following feedback from the machine learning community.

Dataset Description

This dataset contains 327 time series corresponding to the temporal values of 327 telemetry parameters over the life of the real GOCE satellite (from March 2009 to October 2013). It consists both the raw data and Machine-Learning ready-to-use resampled data: - The raw values (calibrated values of each parameter) as {param}_raw.parquet files (irregular) - Resampled and popular statistics computed over 10-minutes windows for each parameter as {param}_stats_10min.parquet files. - Resampled and popular statistics computed over 6-hours windows for each parameter as {param}_stats_6h.parquet - metadata.csv list of all parameters with description, subsystem, first and last timestamp where a value is recorded, fraction of NaN in the calculated statistics and the longest data gap. - mass_properties.csv: provides information relative to the satellite mass (for example the remaining fuel on-board).

Why is it a good dataset for time series forecasting?

Real-world: the data originates from a real-world complex engineering system

Many variables: 327 allowing for multivariate time series forecasting.

Variables having engineering values and units (Volt, Ampere, bar, m, m/s, etc...). See the metadata

Different and irregular sampling rates: some parameters have a value recorded every second, other have a value recorded at a lower sampling rate such as every 16 or 32s. This is a challenge often encountered in real-world systems with sensor records that complexity the data pipelines, and input data fed into your models. If you want to start easy, work with the 10min or 6h resampled files.

Missing Data and Large Gaps: you'll have to drop many parameters which have too much missing data, and carefully design and test you data processing, model training, and model evaluation strategy.

Suggested task 1: forecast 24 hrs ahead the 10-min last value given historical data

Suggested task 2: forecast 7 days ahead the 6-hour last value given historical data

About the GOCE Satellite

The Gravity Field and Steady-State Ocean Circulation Explorer (GOCE; pronounced ‘go-chay’), is a scientific mission satellite from the European Space Agency (ESA).

Objectives

GOCE's primary mission objective was to provide an accurate and detailed global model of Earth's gravity field and geoid. For this purpose, it is equipped with a state-of-the-art Gravity Gradiometer and precise tracking system.

Payloads

The satellite's main payload was the Electrostatic Gravity Gradiometer (EGG) to measure the gravity field of Earth. Other payload was an onboard GPS receiver used as a Satellite-to-Satellite Tracking Instrument (SSTI); a compensation system for all non-gravitational forces acting on the spacecraft. The satellite was also equipped with a laser retroreflector to enable tracking by ground-based Satellite laser ranging station.

The satellite's unique arrow shape and fins helped keep GOCE stable as it flew through the thermosphere at a comparatively low altitude of 255 kilometres (158 mi). Additionally, an ion propulsion system continuously compensated for the variable deceleration due to air drag without the vibration of a conventional chemically powered rocket engine, thus limiting the errors in gravity gradient measurements caused by non-gravitational forces and restoring the path of the craft as closely as possible to a purely inertial trajectory.

Thermal considerations

Due to the orbit and satellite configuration, the solar panels experienced extreme temperature variations. The design therefore had to include materials that could tolerate temperatures as high as 160 degC and as low as -170 degC.

Due to its stringent temperature stability requirements (for the gradiometer sensor heads, in the range of milli-Kelvin) the gradiometer was thermally decoupled from the satellite and had its own dedicated thermal-control system.

Mission Operations

Flight operations were conducted from the European Space Operations Centre, based in Darmstadt, Germany.

It was launched on 17 March 2009 and came to and end of mission on 21 October 2013 because it ran out of propellant. As planned, the satellite began dropping out of orbit and made an uncontrolled re-entry on 11 November 2013

Orbit

GOCE used a Sun-synchronous orbit with an inclindation of 96.7 degree, a mean altitude of approximately 263 km, an orbital period of 90 minutes, and a mean local solar time at ascending node of 18:00.

Resources

[Data Source](https://earth.esa....
d
Data from: Novel mitochondrial genome rearrangements including duplications...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bushra Fazal Minhas; Emily A. Beck; C.-H. Christina Cheng; Julian Catchen (2023). Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy could underlie temperature adaptations in Antarctic notothenioid fishes [Dataset]. http://doi.org/10.5061/dryad.9ghx3ffn0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.9ghx3ffn0
Dataset updated
Dec 11, 2023
Dataset provided by
Dryad
Authors
Bushra Fazal Minhas; Emily A. Beck; C.-H. Christina Cheng; Julian Catchen
Time period covered
Nov 28, 2023
Description
Data for "Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy could underlie temperature adaptations in Antarctic Notothenioid Fishes"

Minhas BF, Beck EA, Cheng CC-H, Catchen, JM. (2022). Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy in Antarctic Notothenioid Fishes bioRxiv 2022.09.19.508608; doi: https://doi.org/10.1101/2022.09.19.508608

Species

Blackfin icefish

Mitochondrial genome assembly and annotation for the white-blooded, Antarctic blackfin icefish (Chaenocephalus aceratus). Mt genome shows 3 tandemly duplicated ND6 copies and evidence of heteroplasmy.

Pike Icefish

Mitochondrial genome assembly and annotation for the white-blooded, secondarily temperate pike icefish (Champsocephalus aceratus). Mt genome shows 4 tandemly duplicated ND6 copies and evidence of heteroplasmy.

Mackerel icefish

Mitochondrial genome assembly and annotation for the white-blooded, ...
Air quality - nitrogen dioxide - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Sep 22, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2015). Air quality - nitrogen dioxide - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/air-quality-nitrogen-dioxide
Explore at:
Dataset updated
Sep 22, 2015
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Datasets showing nitrogen dioxide levels (NO2) at various locations around Leeds. Please note From 17/02/17 this dataset will be archived and superseeded by the ratified air quality dataset https://datamillnorth.org/dataset/ratified-air-quality---nitrogen-dioxide which contains corroborated data quality checked by external auditors. Additional information The data is collected on an hourly basis Column A = Date of collection (YYMMDD) Column B = Time of collection Column C = Reading Column D = Validation (14 means the data has been validated but not that it has been ratified) NOTE: The data is not necessarily collected for all dates/times/stations Defra air quality data Information manage a further two stations in Leeds You can be sent information through their air quality bulletin and request up to hourly information from http://uk-air.defra.gov.uk/bulletin-subscription Archive CSV data can be downloaded from http://uk-air.defra.gov.uk/data/data_selector?=l&1=&s=&o=#mid
c
Public Dataset Access and Usage
s.cnmilf.com
data.sfgov.org
+2more
Updated Oct 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sfgov.org (2025). Public Dataset Access and Usage [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/public-dataset-access-and-usage
Explore at:
Dataset updated
Oct 4, 2025
Dataset provided by
data.sfgov.org
Description
A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc). B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process. C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL. D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal. Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ownerly (2021). C Street Cross Street Data in Reading, MA [Dataset]. https://www.ownerly.com/ma/reading/c-st-home-details

C Street Cross Street Data in Reading, MA

Explore at:

Dataset updated

Dec 6, 2021

Dataset authored and provided by

Ownerly

Area covered

Reading, Massachusetts

Description

This dataset provides information about the number of properties, residents, and average property values for C Street cross streets in Reading, MA.

Clear search

Close search

Google apps

Main menu

C Street Cross Street Data in Reading, MA

race-c

NNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute,...

Dataset of psychophysiological data from children with learning difficulties...

README

Authors

Contact person

Project name

Year that the project ran

Brief overview of the tasks in the experiment

Description of the contents of the dataset

Independent variables

Dependent variables

Methods

Subjects

Information about the recruitment procedure

Apparatus

Initial setup

Task details

MIEDT dataset

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Data from: Eyelit: Eye-movement and reader response data during literary...

Dataset: What the Eyes Reveal about (Reading) Poetry

August 2025 data-update for "Updated science-wide author databases of...

Input-output power spectral densities for three C-band EDFAs and four...

Leash-Bio-processed-dataset

Processed dataset for https://www.kaggle.com/competitions/leash-BELKA.

Last update : 22-may-2024

[1] reduced data

[2] extracted ECFP4 fingerprints

[3] graph NN processed data

[4] conformer. i.e. molecule estimated xyz data

Patent AT-E401025-T1: [Translated] DEVICE FOR PREPARING A DRINK FROM A...

Supplemental Material for Morton et al., 2023

Kaggle Data Science Survey 2017-2021

Context

Methodology

Content

Acknowledgements

Inspiration

Data from: A consensus compound/bioactivity dataset for data-driven drug...

Police Department Stop Data

GOCE Satellite Telemetry

Dataset Description

Why is it a good dataset for time series forecasting?

About the GOCE Satellite

Objectives

Payloads

Thermal considerations

Mission Operations

Orbit

Resources

Data from: Novel mitochondrial genome rearrangements including duplications...

Data for "Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy could underlie temperature adaptations in Antarctic Notothenioid Fishes"

Species

Blackfin icefish

Pike Icefish

Mackerel icefish

Air quality - nitrogen dioxide - Dataset - data.gov.uk

Public Dataset Access and Usage

C Street Cross Street Data in Reading, MASee More Versions

C Street Cross Street Data in Reading, MA