81 datasets found
  1. o

    C Street Cross Street Data in Reading, MA

    • ownerly.com
    Updated Dec 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ownerly (2021). C Street Cross Street Data in Reading, MA [Dataset]. https://www.ownerly.com/ma/reading/c-st-home-details
    Explore at:
    Dataset updated
    Dec 6, 2021
    Dataset authored and provided by
    Ownerly
    Area covered
    Reading, Massachusetts
    Description

    This dataset provides information about the number of properties, residents, and average property values for C Street cross streets in Reading, MA.

  2. h

    race-c

    • huggingface.co
    Updated Apr 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tasksource (2023). race-c [Dataset]. https://huggingface.co/datasets/tasksource/race-c
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2023
    Dataset authored and provided by
    tasksource
    Description

    Race-C : additional data for race (high school/middle school) but for college level https://github.com/mrcdata/race-c @InProceedings{pmlr-v101-liang19a, title={A New Multi-choice Reading Comprehension Dataset for Curriculum Learning}, author={Liang, Yichan and Li, Jianheng and Yin, Jian}, booktitle={Proceedings of The Eleventh Asian Conference on Machine Learning}, pages={742--757}, year={2019} }

  3. NNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute,...

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). NNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute, Probable [Dataset]. https://catalog.data.gov/dataset/nndss-table-1q-hepatitis-b-perinatal-infection-to-hepatitis-c-acute-probable-2ee3e
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    NNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute, Probable - 2022. In this Table, provisional cases* of notifiable diseases are displayed for United States, U.S. territories, and Non-U.S. residents. Notes: • These are weekly cases of selected infectious national notifiable diseases, from the National Notifiable Diseases Surveillance System (NNDSS). NNDSS data reported by the 50 states, New York City, the District of Columbia, and the U.S. territories are collated and published weekly as numbered tables available at https://www.cdc.gov/nndss/data-statistics/index.html. Cases reported by state health departments to CDC for weekly publication are subject to ongoing revision of information and delayed reporting. Therefore, numbers listed in later weeks may reflect changes made to these counts as additional information becomes available. Case counts in the tables are presented as published each week. See also Guide to Interpreting Provisional and Finalized NNDSS Data at https://www.cdc.gov/nndss/docs/Readers-Guide-WONDER-Tables-20210421-508.pdf. • Notices, errata, and other notes are available in the Notice To Data Users page at https://wonder.cdc.gov/nndss/NTR.html. • The list of national notifiable infectious diseases and conditions and their national surveillance case definitions are available at https://ndc.services.cdc.gov/. This list incorporates the Council of State and Territorial Epidemiologists (CSTE) position statements approved by CSTE for national surveillance. Footnotes: *Case counts for reporting years 2021 and 2022 are provisional and subject to change. Cases are assigned to the reporting jurisdiction submitting the case to NNDSS, if the case's country of usual residence is the U.S., a U.S. territory, unknown, or null (i.e. country not reported); otherwise, the case is assigned to the 'Non-U.S. Residents' category. Country of usual residence is currently not reported by all jurisdictions or for all conditions. For further information on interpretation of these data, see https://www.cdc.gov/nndss/docs/Readers-Guide-WONDER-Tables-20210421-508.pdf. †Previous 52 week maximum and cumulative YTD are determined from periods of time when the condition was reportable in the jurisdiction (i.e., may be less than 52 weeks of data or incomplete YTD data). U: Unavailable — The reporting jurisdiction was unable to send the data to CDC or CDC was unable to process the data. -: No reported cases — The reporting jurisdiction did not submit any cases to CDC. N: Not reportable — The disease or condition was not reportable by law, statute, or regulation in the reporting jurisdiction. NN: Not nationally notifiable — This condition was not designated as being nationally notifiable. NP: Nationally notifiable but not published. NC: Not calculated — There is insufficient data available to support the calculation of this statistic. Cum: Cumulative year-to-date counts. Max: Maximum — Maximum case count during the previous 52 weeks.

  4. Dataset of psychophysiological data from children with learning difficulties...

    • openneuro.org
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    César E. Corona-González; Claudia Rebeca De Stefano-Ramos; Juan Pablo Rosado-Aíza; David I. Ibarra-Zarate; Fabiola R. Gómez-Velázquez; Luz María Alonso-Valerdi (2025). Dataset of psychophysiological data from children with learning difficulties who strengthen reading and math skills through assistive technology [Dataset]. http://doi.org/10.18112/openneuro.ds006260.v1.0.0
    Explore at:
    Dataset updated
    May 26, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    César E. Corona-González; Claudia Rebeca De Stefano-Ramos; Juan Pablo Rosado-Aíza; David I. Ibarra-Zarate; Fabiola R. Gómez-Velázquez; Luz María Alonso-Valerdi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    README

    Authors

    César E. Corona-González, Claudia Rebeca De Stefano-Ramos, Juan Pablo Rosado-Aíza, Fabiola R Gómez-Velázquez, David I. Ibarra-Zarate, Luz María Alonso-Valerdi

    Contact person

    César E. Corona-González

    https://orcid.org/0000-0002-7680-2953

    a00833959@tec.mx

    Project name

    Psychophysiological data from Mexican children with learning difficulties who strengthen reading and math skills by assistive technology

    Year that the project ran

    2023

    Brief overview of the tasks in the experiment

    The current dataset consists of psychometric and electrophysiological data from children with reading or math learning difficulties. These data were collected to evaluate improvements in reading or math skills resulting from using an online learning method called Smartick.

    The psychometric evaluations from children with reading difficulties encompassed: spelling tests, where 1) orthographic and 2) phonological errors were considered, 3) reading speed, expressed in words read per minute, and 4) reading comprehension, where multiple-choice questions were given to the children. The last 2 parameters were determined according to the standards from the Ministry of Public Education (Secretaría de Educación Pública in Spanish) in Mexico. On the other hand, group 2 assessments embraced: 1) an assessment of general mathematical knowledge, as well as 2) the hits percentage, and 3) reaction time from an arithmetical task. Additionally, selective attention and intelligence quotient (IQ) were also evaluated.

    Then, individuals underwent an EEG experimental paradigm where two conditions were recorded: 1) a 3-minute eyes-open resting state and 2) performing either reading or mathematical activities. EEG recordings from the reading experiment consisted of reading a text aloud and then answering questions about the text. Alternatively, EEG recordings from the math experiment involved the solution of two blocks with 20 arithmetic operations (addition and subtraction). Subsequently, each child was randomly subcategorized as 1) the experimental group, who were asked to engage with Smartick for three months, and 2) the control group, who were not involved with the intervention. Once the 3-month period was over, every child was reassessed as described before.

    Description of the contents of the dataset

    The dataset contains a total of 76 subjects (sub-), where two study groups were assessed: 1) reading difficulties (R) and 2) math difficulties (M). Then, each individual was subcategorized as experimental subgroup (e), where children were compromised to engage with Smartick, or control subgroup (c), where they did not get involved with any intervention.

    Every subject was followed up on for three months. During this period, each subject underwent two EEG sessions, representing the PRE-intervention (ses-1) and the POST-intervention (ses-2).

    The EEG recordings from the reading difficulties group consisted of a resting state condition (run-1) and while performing active reading and reading comprehension activities (run-2). On the other hand, EEG data from the math difficulties group was collected from a resting state condition (run-1) and when solving two blocks of 20 arithmetic operations (run-2 and run-3). All EEG files were stored in .set format. The nomenclature and description from filenames are shown below:

    NomenclatureDescription
    sub-Subject
    MMath group
    RReading group
    cControl subgroup
    eExperimental subgroup
    ses-1PRE-intervention
    ses-2POST-Intervention
    run-1EEG for baseline
    run-2EEG for reading activity, or the first block of math
    run-3EEG for the second block of math

    Example: the file sub-Rc11_ses-1_task-SmartickDataset_run-2_eeg.set is related to: - The 11th subject from the reading difficulties group, control subgroup (sub-Rc11). - EEG recording from the PRE-intervention (ses-1) while performing the reading activity (run-2)

    Independent variables

    • Study groups:
      • Reading difficulties
        • Control: children did not follow any intervention
        • Experimental: Children used the reading program of Smartick for 3 months
      • Math difficulties
        • Control: children did not follow any intervention
        • Experimental: Children used the math program of Smartick for 3 months
    • Condition:
      • PRE-intervention: first psychological and electroencephalographic evaluation
      • POST-intervention: second psychological and electroencephalographic evaluation

    Dependent variables

    • Psychometric data from the reading difficulties group:

      • Orthographic_ERR: number of orthographic errors.
      • Phonological_ERR: number of phonological errors.
      • Selective_Attention: score from the selective attention test.
      • Reading_Speed: reading speed in words per minute.
      • Comprehension: score on a reading comprehension task.
      • GROUP: C for the control group, E for the experimental group.
      • GENDER: M for male, F for Female.
      • AGE: age at the beginning of the study.
      • IQ: intelligence quotient.
    • Psychometric data from the math difficulties group:

      • WRAT4: score from the WRAT-4 test.
      • hits: hits during the EEG acquisition [%].
      • RT: reaction time during the EEG acquisition [s].
      • Selective_Attention: score from the selective attention test.
      • GROUP: C for the control Group, E for the experimental group.
      • GENDER: M for male, F for female.
      • AGE: age at the beginning of the study.
      • IQ: intelligence quotient.

    Psychometric data can be found in the 01_Psychometric_Data.xlsx file

    • Engagement percentage within Smartick (only for experimental group)
      • These values represent the engagement percentage through Smartick.
      • Students were asked to get involved with the online method for learning for 3 months, 5 days a week.
      • Greater values than 100% denote participants who regularly logged in more than 5 days weekly.

    Engagement percentage be found in the 05_SessionEngagement.xlsx file

    Methods

    Subjects

    Seventy-six Mexican children between 7 and 13 years old were enrolled in this study.

    Information about the recruitment procedure

    The sample was recruited through non-profit foundations that support learning and foster care programs.

    Apparatus

    g.USBamp RESEARCH amplifier

    Initial setup

    1. Explain the task to the participant.
    2. Sign informed consent.
    3. Set up electrodes.

    Task details

    The stimuli nested folder contains all stimuli employed in the EEG experiments.

    Level 1 - Math: Images used in the math experiment.​​​​​​​ - Reading: Images used in the reading experiment.

    Level 2 - Math * POST_Operations: arithmetic operations from the POST-intervention.
    * PRE_Operations: arithmetic operations from the PRE-intervention. - Reading * POST_Reading1: text 1 and text-related comprehension questions from the POST-intervention. * POST_Reading2: text 2 and text-related comprehension questions from the POST-intervention. * POST_Reading3: text 3 and text-related comprehension questions from the POST-intervention. * PRE_Reading1: text 1 and text-related comprehension questions from the PRE-intervention. * PRE_Reading2: text 2 and text-related comprehension questions from the PRE-intervention. * PRE_Reading3: text 3 and text-related comprehension questions from the PRE-intervention.

    Level 3 - Math * Operation01.jpg to Operation20.jpg: arithmetical operations solved during the first block of the math

  5. MIEDT dataset

    • kaggle.com
    Updated Jan 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    机关鸢鸟 (2025). MIEDT dataset [Dataset]. https://www.kaggle.com/datasets/lidang78/miedt-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    机关鸢鸟
    Description
      1. Dataset Overview This dataset is organized based on the edge detection task, aiming to provide rich image resources and corresponding edge detection annotation information for related research and applications, which can be used for the testing of edge detection algorithms. In order to evaluate the performance of the edge detection method comprehensively, we created the Medical Image Edge Detection Test (MIEDT) dataset. The MIEDT contains 100 medical images, which were randomly selected from three publicly available datasets, Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 .
      1. Data Set Structure Original image: This folder stores the original image data. It contains 15 Head CT images in PNG format with varying image resolutions; 25 coronary heart disease images in JPG format and with an image resolution of [1024 * 1024]; 60 skin images in JPG format and with an image resolution of [600 * 450]. It covers a variety of medical image materials with different imaging and contrast, providing diverse input data for edge detection algorithms. Ground truth:The data in this folder are the edge detection annotation images corresponding to the images in the "Originals" folder. They are in PNG format. In these images, the white pixels represent the edge parts of the image, and the black pixels represent the non-edge areas. These annotation information accurately outlines the object contours and edge features in the original images.
      1. Usage Instructions For users who conduct image processing using Python, they can utilize the cv2 (OpenCV) library to read image data. The sample code is as follows:

    import cv2 original_image = cv2.imread('Original image/IMG-001.png') # Read original image ground_truth_image = cv2.imread('Ground truth/GT-001.png', cv2.IMREAD_GRAYSCALE) # Read the corresponding Ground Truth image When performing model training based on deep learning frameworks (such as TensorFlow, PyTorch), the dataset path can be configured into the corresponding dataset loading class according to the data loading mechanism of the framework to ensure that the model can correctly read and process the image and its annotation data.

    • 4. Data Sources and References Data Sources: The original images are collected from public image datasets Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 to ensure the quality and diversity of the images. If you are using this dataset in academic research, please cite the following literature.

    References: [1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368

    [2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).

    [3] Classification of Brain Hemorrhage Using Deep Learning from CT Scan Images - https://link.springer.com/chapter/10.1007/978-981-19-7528-8_15

  6. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  7. D

    Data from: Eyelit: Eye-movement and reader response data during literary...

    • ssh.datastations.nl
    bin, csv, pdf, txt +1
    Updated Apr 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H.M.L. Mak; R.M. Willems; H.M.L. Mak; R.M. Willems (2021). Eyelit: Eye-movement and reader response data during literary reading [Dataset]. http://doi.org/10.17026/DANS-ZQK-ZMQS
    Explore at:
    bin(30157357), bin(31965410), csv(661645813), csv(534378046), bin(21852492), bin(4502105), bin(36072658), bin(6438396), bin(5628866), csv(620019005), bin(26685442), bin(6444065), bin(5235749), csv(582059078), csv(558011683), bin(34586377), bin(29959230), bin(32012264), bin(7580127), bin(5986698), bin(7745282), bin(33331478), csv(493211098), bin(29623507), bin(5559231), bin(31109110), csv(975214371), csv(476451087), csv(381470397), bin(7362825), bin(37957712), bin(27087358), bin(36150833), csv(557964373), bin(5724800), csv(560554131), csv(417171072), bin(20552580), bin(6025729), csv(311565352), bin(36965503), csv(523685408), bin(38030296), bin(6537542), csv(887262542), csv(337356627), bin(31287101), bin(5573348), csv(937760515), bin(6957824), bin(8388037), bin(5208994), bin(7730242), bin(32265679), csv(336945778), csv(620653236), bin(32899627), csv(508088266), csv(472599133), bin(6359541), csv(407051493), csv(598608804), bin(29185561), bin(45156247), csv(398844149), bin(5020160), csv(629346736), bin(25539995), bin(5550796), csv(399044022), bin(5765353), bin(43060760), bin(29147612), csv(668032458), bin(29304292), bin(32231528), bin(25295616), bin(7162574), bin(5435132), bin(26484997), bin(22212757), bin(18428148), csv(676287894), bin(4648796), bin(5069298), txt(4136), bin(8255879), bin(6108715), bin(41236352), csv(494047986), bin(27468730), bin(32148631), csv(552716109), csv(485622707), csv(651519667), csv(746362972), bin(5150091), bin(37449812), bin(28037343), bin(7562265), csv(1313476), bin(6038727), bin(40562660), bin(7178983), bin(32505356), csv(516571336), bin(33839924), bin(5055250), csv(463041081), csv(587948161), csv(287923005), bin(6368971), csv(465643357), bin(28636076), bin(8377434), bin(6590582), csv(479984858), bin(7102333), bin(29298584), csv(454339138), bin(18976997), bin(28733575), bin(6016248), csv(537271195), bin(6690776), bin(27314972), bin(6284906), bin(6230783), csv(448534064), csv(485388875), bin(29701894), csv(539173062), csv(477503906), bin(5599126), bin(6058325), bin(5424112), bin(6865547), zip(2837783258), bin(6286666), csv(518094913), bin(21021261), csv(694955728), bin(39710487), csv(531033451), csv(535099602), csv(536881778), bin(29824338), bin(5783867), bin(5882019), csv(406745528), csv(437155907), bin(34651085), pdf(537380), csv(490987361), bin(25362131), csv(6987153), bin(5579351), zip(2549577526), bin(5473882), bin(4231006), csv(295344872), csv(443107907), bin(7569531), bin(9213419), bin(4232167), csv(421297559), bin(42163956), bin(26222501), bin(5842103), csv(467577761), bin(6072537), bin(32117269), bin(7132559), bin(26470082), bin(44055149), csv(811967141), bin(7034677), csv(471346879), bin(45693428), bin(27179659), bin(5924264), bin(26328879), bin(5931072), bin(40428983), bin(5438000), bin(28259466), csv(30566), csv(348314914), csv(386064866), bin(33513997), bin(35624747), bin(30684098), bin(29834398), csv(438255721), csv(464154188), csv(320023323), bin(5692140), csv(278847035), bin(34494086), bin(5320934), csv(402807388), bin(28950931), bin(27609042), bin(4764609), csv(414860896), zip(212176), bin(33605328), bin(6163958), csv(637291005), csv(515217863), bin(4508627), csv(436893685), bin(6651970), bin(6434543), bin(27096558), bin(6807296), bin(34903947), csv(707602696), bin(5442999), csv(431035399), bin(5521864), csv(647825365), csv(704813768), csv(489905757), bin(4097192), bin(28230149), bin(4267613), bin(30010180), csv(486977215), bin(36204093), csv(542873726), bin(18987834), bin(26230740), bin(6152216), bin(6186677), bin(41956282), bin(5710174), pdf(402836), csv(334976288), bin(6335420), bin(7119333), bin(34584069), bin(20401056), bin(40857083), csv(728464474), bin(23625718), csv(589202194), csv(289904519), bin(5577327), csv(459359294), bin(31703368), bin(5594852), bin(41149057), bin(6815634), bin(35021788), csv(308630183), bin(38686106), bin(7696235), csv(302665339), csv(476178484), bin(39231573), bin(27017657), bin(44888640), bin(4949915), bin(39025691), bin(7157013), csv(443097454), bin(5545932), csv(650230047), bin(5754988), csv(680826693), bin(40380607), bin(33966350), csv(445692883), csv(487795925), bin(7338739), csv(550689203), bin(32361671), bin(26699842), bin(5775160), csv(657791287), bin(4659091), bin(4687655), bin(8425219), bin(39261172), csv(602013475), bin(28042858), bin(5912866), bin(6323363), bin(5604972), csv(10419), bin(8113657), csv(427000151), bin(5448739), bin(25318570), bin(7409279), bin(32421856), csv(367575657), bin(5259965), csv(554593399), bin(25642860), csv(680205347), bin(54857911), csv(420855177), bin(5360802), bin(25829442), csv(538810532), csv(414197947), bin(6688737), bin(28959215), bin(6466366), csv(531517808), csv(571164214), bin(28355444), csv(308108571), csv(639635211), bin(5374524), csv(498142013), bin(22777707)Available download formats
    Dataset updated
    Apr 6, 2021
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    H.M.L. Mak; R.M. Willems; H.M.L. Mak; R.M. Willems
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is an extensive eye tracking dataset of 102 participants reading three Dutch literary short stories each (7790 words per participant). The preprocessed data set includes (1) Fixation report (fixation-level), (2) Saccade report, (3) Interest Area report (word-level), (4) Trial report (aggregated data for each page; stories were split up into 30 pages each), (5) Sample report (the data were sampled at 500 Hz, this report includes data on all individual samples), (6) Questionnaire data on reading experiences and other participant characteristics, and (7) word characteristics for all words in the stories (with the potential of calculating additional word characteristics).The study for which this data set was collected explored the effect of simulation on reading behavior by means of eye tracking. We hypothesized (A) that simulation would lead to longer fixation times for parts of the text high in simulation-eliciting content. Additionally, we hypothesized (B) that we would find personal preferences in the reaction to different types of simulation-eliciting content. We expected (C) that the findings from the eye tracking data would be related to self-report of simulation. Finally, we expected (D) that the amount of simulation would be predictive of self-report of appreciation. We found (A) longer reading times for perceptual and mental event simulation, but shorter reading times for motor simulation. The strength of the relationship between simulation and reading times varied between participants, but was positively correlated across the different types of simulation-eliciting content (B). Regarding (C) and (D), we found that this variation in the strength of the relationship between simulation and reading times was indeed related to aspects of self-reported simulation, absorption and appreciation.The findings from this study are described in detail in Mak & Willems (2019). https://doi.org/10.1080/23273798.2018.1552007

  8. f

    Dataset: What the Eyes Reveal about (Reading) Poetry

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Dec 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wallot, Sebastian; Menninghaus, Winfried (2020). Dataset: What the Eyes Reveal about (Reading) Poetry [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000555746
    Explore at:
    Dataset updated
    Dec 16, 2020
    Authors
    Wallot, Sebastian; Menninghaus, Winfried
    Description

    dataPOEM.csv The dataPOEM.csv data set contains data on the level of each poem. scoresAes = factor scores of moving, beauty, and melodious ratings. participant = participant number poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) poemIdentity = poem number avgWFreq = average word frequency of poem totalGazeSlopeLineLength totalGazeWordMeanNAByWordLen totalGazeWordMeanNADiff order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) firstFixDurMS_MINFIX_AVG = first fixation duration totalGazeMS_MINFIX_AVG = total gaze durations fixDurMS_MINFIX_NUM = number of fixations sacLenMS_MINFIX_AVG = average saccade length percRegMS_MINFIX_AVG = percentage of regressive eye movements pupilDial_AVG = average pupil dilation blink_NUM_TotalRT = number of blinks relative to total reading time totalReadingTime = total reading time of the poem areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem moving = rating of how moving the poem was beauty = rating of how beautiful the poem was melodious = rating of how melodious the poem was dataROI.csv The dataROI.csv data set contains data on the level of each line within a poem. order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) participant = participant number poemIdentity = poem number lineNr = line number within poem poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza) BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme) lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem) totalGazeByWordNA = total gaze duration of final word of a line relative to word length gazeByLineLengthNA = total gaze duration of a line relative to line length dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem

  9. d

    August 2025 data-update for "Updated science-wide author databases of...

    • elsevier.digitalcommonsdata.com
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John P.A. Ioannidis (2025). August 2025 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.8
    Explore at:
    Dataset updated
    Sep 19, 2025
    Authors
    John P.A. Ioannidis
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2024 and single recent year data pertain to citations received during calendar year 2024. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2025 snapshot from Scopus, updated to end of citation year 2024. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2025. If an author is not on the list, it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

  10. d

    Input-output power spectral densities for three C-band EDFAs and four...

    • data.dtu.dk
    txt
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Metodi Plamenov Yankov; Francesco Da Ros (2023). Input-output power spectral densities for three C-band EDFAs and four multi-span inline EDFAd fiber optic systems of different lengths [Dataset]. http://doi.org/10.11583/DTU.13135754.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Metodi Plamenov Yankov; Francesco Da Ros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scripts to read the data into Matlab are available here:https://github.com/myankov/EDFA-data-reading-scripts/The datasets contains metadata, such as:1) unique ID of PSD profile2) unique ID of physical amplifier3) Total input and output power recordings of each EDFA4) unique ID of the multi-span system, indicating the order of the EDFAs, as well as the fiber span lengthsAs well as PSDs readings from an OSA for the input PSD and the output PSD at the OSA wavelengths.

  11. Leash-Bio-processed-dataset

    • kaggle.com
    Updated May 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hengck23 (2024). Leash-Bio-processed-dataset [Dataset]. https://www.kaggle.com/datasets/hengck23/leash-bio-processed-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    hengck23
    Description

    Processed dataset for https://www.kaggle.com/competitions/leash-BELKA.

    For any b2z file, It is recommend to be parallel bzip decompressor (https://github.com/mxmlnkn/indexed_bzip2) for speed.

    Last update : 22-may-2024

    In summary:

    See forum discussion for details of [1],[2]: https://www.kaggle.com/competitions/leash-BELKA/discussion/492846

    [1] reduced data

    • train.reduced.parquet : 98_415_610 training SMILES and their information
    • train.bind.npz : 98_415_610 x 3 target matrix
    • test.reduced.parquet : 878_022 test SMILES
    • all_buildingblock.csv: building blocks id used in train.reduced.parquet/test.reduced.parquet
    • fold0.parquet: train_share,valid_share,valid_nonshare splits for the experiments in the discussion

    [2] extracted ECFP4 fingerprints

    • train.ecfp4.packed.npz : Features extracted using rdkit
      • AllChem.GetMorganFingerprintAsBitVect(mol, 2, 2048)
      • repack with np.packbits() to give 98_415_610 x 256 feature matrix
    • test.ecfp4.packed.npz : similarly processed for the test SMILES

    This is somehow obsolete as the competition progresses. ecfp6 gives better results and can be extracted fast with scikit-fingerprints.

    See forum discussion for details of [3]: https://www.kaggle.com/competitions/leash-BELKA/discussion/498858 https://www.kaggle.com/code/hengck23/lb6-02-graph-nn-example

    [3] graph NN processed data

    • test/train-replace-c.smiles.bytestring.bz2 : replace linker [Dy] with C. Note that these are bytestrings and not strings.
    • train-replace-c-30m.graph.pickle.**.b2z : 98_415_610 molecule graph split into 3 files. test graphs are not provided as they are be generated on the fly.

    See forum discussion for details of [4]: https://www.kaggle.com/competitions/leash-BELKA/discussion/505985 https://www.kaggle.com/code/hengck23/conforge-open-source-conformer-generator

    [4] conformer. i.e. molecule estimated xyz data

    • test-replace-c.conforge.sdf.bz2 : conformer in sdf file. you can read the file using rdkit Chem.SDMolSupplier().
    • test-replace-c.conforge.status.parquet:
      • 'status col' shows the status of conformer. 0 means success. for failure cases, sdf store a dummy 'CC' molecule.
      • 'idx col' shows the idx (primary key) to test.reduced.parquet. use this to retrieve SMILES strings. Note that conformer is based on test-replace-c.smiles.bytestring.bz2, i.e. [Dy] is replaced by C.
    • train-replace-c.sub-[split].conforge.sdf.bz2/status.parquet: smiliar format as describe above. [split] are:
      • train: 1000250+(1001610*3) molecules
      • valid: 40000
      • nonshare: about 61674
  12. d

    Patent AT-E401025-T1: [Translated] DEVICE FOR PREPARING A DRINK FROM A...

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Biotechnology Information (NCBI) (2025). Patent AT-E401025-T1: [Translated] DEVICE FOR PREPARING A DRINK FROM A CARTRIDGE, WITH ACTIVATION AFTER READING AN OPTICAL CODE ON THE CARTRIDGE [Dataset]. https://catalog.data.gov/dataset/patent-at-e401025-t1-translated-device-for-preparing-a-drink-from-a-cartridge-with-activat
    Explore at:
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    National Center for Biotechnology Information (NCBI)
    Description

    In an apparatus for the preparation of a plurality of drinks from cartridges (K) that are provided with an optical code (C) on one of its faces (F) identifying the cartridge (K) itself and the corresponding drink, the reading of the optical code (C) is made more certain and reliable thanks to a projecting reading window (40).

  13. f

    Supplemental Material for Morton et al., 2023

    • gsajournals.figshare.com
    • datasetcatalog.nlm.nih.gov
    application/gzip
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth A. Morton; Ashley N. Hall; Josh T. Cuperus; Christine Queitsch (2023). Supplemental Material for Morton et al., 2023 [Dataset]. http://doi.org/10.25386/genetics.22197457.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    GSA Journals
    Authors
    Elizabeth A. Morton; Ashley N. Hall; Josh T. Cuperus; Christine Queitsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Whole genome sequencing was performed on C. elegans strains with different rDNA copy numbers.

    CB3740_eDp20_WS235_sort.bam – Aligned whole genome sequence data for C. elegans strain CB3740 (eDf24 I; eDp20 (I,II); mnT12 (IV,X))

    CB3740_eDP20_WS235_sort_depth.txt – Read depth analysis file for whole genome sequencing of C. elegans strain CB3740 (eDf24 I; eDp20 (I,II); mnT12 (IV,X))

    eDP20_chrI_13mil.txt – Read depth data for right arm of ChrI of C. elegans strain CB3740

    N2_chrI_13mil.txt – Read depth data for right arm of ChrI of C. elegans wild type strain N2

    SEA296_MY1_130E_chrI_merge_RG.bam – Aligned whole genome sequence data for C. elegans strain SEA296 (mIs13[myo-2p::GFP + pes-10p::GFP + F22B7.9p::GFP] I, catIR8[I:, N2>MY1]). Homozygous for 64-copy rDNA array.

    SEA296_MY1_130E_merge.g.vcf – VCF for sequence variants in C. elegans strain SEA296 (mIs13[myo-2p::GFP + pes-10p::GFP + F22B7.9p::GFP] I, catIR8[I:, N2>MY1]). Homozygous for 64-copy rDNA array.

    SEA300_duprm_RG.bam – Aligned whole genome sequence data for C. elegans strain SEA300 (catIR12[I:?-end , MY1>N2]). Homozygous for 417-copy rDNA array.

    SEA300_duprm_RG.g.vcf – VCF for sequence variants in C. elegans strain SEA300 (catIR12[I:?-end , MY1>N2]). Homozygous for 417-copy rDNA array.

    SEA302_S2_WS230_duprumRG.bam – Aligned whole genome sequence data for C. elegans strain SEA302 (catIR14[I:~13500000-end, JU775>N2]). Homozygous for 81-copy rDNA array.

    SEA302_S2_WS230_duprmRG.g.vcf – VCF for sequence variants in C. elegans strain SEA302 (catIR14[I:~13500000-end, JU775>N2]). Homozygous for 81-copy rDNA array.

    SEA305_S5_WS230_duprmRG.bam – Aligned whole genome sequence data for C. elegans strain SEA305 (catIR17[I:~3600000-end, MY16>N2]) Homozygous for 73-copy rDNA array.

    SEA305_S5_WS230_duprmRG.g.vcf – VCF for sequence variants in C. elegans strain SEA305 (catIR17[I:~3600000-end, MY16>N2]) Homozygous for 73-copy rDNA array.

  14. Kaggle Data Science Survey 2017-2021

    • kaggle.com
    zip
    Updated Nov 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrada (2021). Kaggle Data Science Survey 2017-2021 [Dataset]. https://www.kaggle.com/datasets/andradaolteanu/kaggle-data-science-survey-20172021/code
    Explore at:
    zip(18555433 bytes)Available download formats
    Dataset updated
    Nov 26, 2021
    Authors
    Andrada
    Description

    Context

    I have created this dataset for an easier way to analyse the progression of answers from the respondents that are participating each year in the very famous Data Science Kaggle Survey.

    The sources of the present data are: * 2017: https://www.kaggle.com/kaggle/kaggle-survey-2017 * 2018: https://www.kaggle.com/kaggle/kaggle-survey-2018 * 2019: https://www.kaggle.com/c/kaggle-survey-2019/data * 2020: https://www.kaggle.com/c/kaggle-survey-2020/data * 2021: https://www.kaggle.com/c/kaggle-survey-2021/data

    Methodology

    This dataset was created by manually aggregating each of the 5 tables mentioned above. The full methodology was as follows:

    • The 2021 table was took as refference, as it is the latest and most "up to date" in regards with the questions and the Data Science Industry overall evolution.
    • Each year in descending order was fully analysed one by one in order to find all questions (and answers) that were the same to the ones found in 2021.
    • As we go back in time, the questions lose their completeness more and more, so I would highly suggest analysing percentages on Year, rather than absolute numbers.

    The aggregation was done manually, as the questions order, naming and types of answers differ from one year to another. Hence, the most accurate way (although not the most efficient), was to read, order and pick the questions with regards to the base table (which was the 2021 Survey).

    Content

    This dataset contains the following:

    • kaggle_survey_2017_2021.csv: the tabular dataset containing the aggregated data from 2017 to 2021.
    • style.css: a file that serves as custom styling for my notebook on this competition.
    • images folder: all images I have used for my notebook on this competition.

    Note: Notebook can be found here.

    Acknowledgements

    Thank you so much to the Kaggle Team for hosting these surveys and sharing with us all the data, so we can take the pulse of the community each year.

    Inspiration

    The Kaggle Survey is reach in information as is, but what can you find by adding another layer of information - the year? Evolutions in time could be fascinating.

  15. Data from: A consensus compound/bioactivity dataset for data-driven drug...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated May 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information

    The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

    The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

    Structure and content of the dataset

    Dataset structure

    ChEMBL

    ID

    PubChem

    ID

    IUPHAR

    ID

    Target

    Activity

    type

    Assay typeUnitMean C (0)...Mean PC (0)...Mean B (0)...Mean I (0)...Mean PD (0)...Activity check annotationLigand namesCanonical SMILES C...Structure checkSource

    The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

    Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

    Column content:

    • ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
    • Target: biological target of the molecule expressed as the HGNC gene symbol
    • Activity type: for example, pIC50
    • Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
    • Unit: unit of bioactivity measurement
    • Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
    • Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
      • no comment: bioactivity values are within one log unit;
      • check activity data: bioactivity values are not within one log unit;
      • only one data point: only one value was available, no comparison and no range calculated;
      • no activity value: no precise numeric activity value was available;
      • no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
    • Ligand names: all unique names contained in the five source databases are listed
    • Canonical SMILES columns: Molecular structure of the compound from each database
    • Structure check: To denote matching or differing compound structures in different source databases
      • match: molecule structures are the same between different sources;
      • no match: the structures differ;
      • 1 source: no structure comparison is possible, because the molecule comes from only one source database.
    • Source: From which databases the data come from

  16. D

    Police Department Stop Data

    • data.sfgov.org
    • s.cnmilf.com
    • +1more
    Updated Oct 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Police Department Stop Data [Dataset]. https://data.sfgov.org/widgets/ubqf-aqzw?mobile_redirect=true
    Explore at:
    xml, xlsx, csv, kml, application/geo+json, kmzAvailable download formats
    Dataset updated
    Oct 27, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    UPDATE 1/7/2025: On June 28th 2023, the San Francisco Police Department (SFPD) changed its Stops Data Collection System (SDCS). As a result of this change, record identifiers have changed from the Department of Justice (DOJ) identifier to an internal record numbering system (referred to as "LEA Record ID"). The data that SFPD uploads to the DOJ system will contain the internal record number which can be used for joins with the data available on DataSF.

    A. SUMMARY The San Francisco Police Department (SFPD) Stop Data was designed to capture information to comply with the Racial and Identity Profiling Act (RIPA), or California Assembly Bill (AB)953. SFPD officers collect specific information on each stop, including elements of the stop, circumstances and the perceived identity characteristics of the individual(s) stopped. The information obtained by officers is reported to the California Department of Justice. This dataset includes data on stops starting on July 1st, 2018, which is when the data collection program went into effect. Read the detailed overview for this dataset here.

    B. HOW THE DATASET IS CREATED By the end of each shift, officers enter all stop data into the Stop Data Collection System, which is automatically submitted to the California Department of Justice (CA DOJ). Once a quarter the Department receives a stops data file from CA DOJ. The SFPD conducts several transformations of this data to ensure privacy, accuracy and compliance with State law and regulation. For increased usability, text descriptions have also been added for several data fields which include numeric codes (including traffic, suspicion, citation, and custodial arrest offense codes, and actions taken as a result of a stop). See the data dictionaries below for explanations of all coded data fields. Read more about the data collection, and transformation, including geocoding and PII cleaning processes, in the detailed overview of this dataset.

    C. UPDATE PROCESS Information is updated on a quarterly basis.

    D. HOW TO USE THIS DATASET This dataset includes information about police stops that occurred, including some details about the person(s) stopped, and what happened during the stop. Each row is a person stopped with a record identifier for the stop and a unique identifier for the person. A single stop may involve multiple people and may produce more than one associated unique identifier for the same record identifier. A certain percentage of stops have stop information that can’t be geocoded. This may be due to errors in data input at the officer level (typos in entry or providing an address that doesn't exist). More often, this is due to officers providing a level of detail that isn't codable to a geographic coordinate - most often at the Airport (ie: Terminal 3, door 22.) In these cases, the location of the stops is coded as unknown.

    E. DATA DICTIONARIES CJIS Offense Codes data look up table

    Look up table for other coded data fields

  17. GOCE Satellite Telemetry

    • kaggle.com
    Updated Jul 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    astro_pat (2024). GOCE Satellite Telemetry [Dataset]. https://www.kaggle.com/datasets/patrickfleith/goce-satellite-telemetry
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Kaggle
    Authors
    astro_pat
    Description

    Utilisation of this data is subject to European Space Agency's Earth Observation Terms and Conditions. Read T&C here

    This is Dataset Version 3 - Updates may be done following feedback from the machine learning community.

    Dataset Description

    This dataset contains 327 time series corresponding to the temporal values of 327 telemetry parameters over the life of the real GOCE satellite (from March 2009 to October 2013). It consists both the raw data and Machine-Learning ready-to-use resampled data: - The raw values (calibrated values of each parameter) as {param}_raw.parquet files (irregular) - Resampled and popular statistics computed over 10-minutes windows for each parameter as {param}_stats_10min.parquet files. - Resampled and popular statistics computed over 6-hours windows for each parameter as {param}_stats_6h.parquet - metadata.csv list of all parameters with description, subsystem, first and last timestamp where a value is recorded, fraction of NaN in the calculated statistics and the longest data gap. - mass_properties.csv: provides information relative to the satellite mass (for example the remaining fuel on-board).

    Why is it a good dataset for time series forecasting?

    • Real-world: the data originates from a real-world complex engineering system
    • Many variables: 327 allowing for multivariate time series forecasting.
    • Variables having engineering values and units (Volt, Ampere, bar, m, m/s, etc...). See the metadata
    • Different and irregular sampling rates: some parameters have a value recorded every second, other have a value recorded at a lower sampling rate such as every 16 or 32s. This is a challenge often encountered in real-world systems with sensor records that complexity the data pipelines, and input data fed into your models. If you want to start easy, work with the 10min or 6h resampled files.
    • Missing Data and Large Gaps: you'll have to drop many parameters which have too much missing data, and carefully design and test you data processing, model training, and model evaluation strategy.
    • Suggested task 1: forecast 24 hrs ahead the 10-min last value given historical data
    • Suggested task 2: forecast 7 days ahead the 6-hour last value given historical data

    About the GOCE Satellite

    The Gravity Field and Steady-State Ocean Circulation Explorer (GOCE; pronounced ‘go-chay’), is a scientific mission satellite from the European Space Agency (ESA).

    Objectives

    GOCE's primary mission objective was to provide an accurate and detailed global model of Earth's gravity field and geoid. For this purpose, it is equipped with a state-of-the-art Gravity Gradiometer and precise tracking system.

    Payloads

    The satellite's main payload was the Electrostatic Gravity Gradiometer (EGG) to measure the gravity field of Earth. Other payload was an onboard GPS receiver used as a Satellite-to-Satellite Tracking Instrument (SSTI); a compensation system for all non-gravitational forces acting on the spacecraft. The satellite was also equipped with a laser retroreflector to enable tracking by ground-based Satellite laser ranging station.

    The satellite's unique arrow shape and fins helped keep GOCE stable as it flew through the thermosphere at a comparatively low altitude of 255 kilometres (158 mi). Additionally, an ion propulsion system continuously compensated for the variable deceleration due to air drag without the vibration of a conventional chemically powered rocket engine, thus limiting the errors in gravity gradient measurements caused by non-gravitational forces and restoring the path of the craft as closely as possible to a purely inertial trajectory.

    Thermal considerations

    Due to the orbit and satellite configuration, the solar panels experienced extreme temperature variations. The design therefore had to include materials that could tolerate temperatures as high as 160 degC and as low as -170 degC.

    Due to its stringent temperature stability requirements (for the gradiometer sensor heads, in the range of milli-Kelvin) the gradiometer was thermally decoupled from the satellite and had its own dedicated thermal-control system.

    Mission Operations

    Flight operations were conducted from the European Space Operations Centre, based in Darmstadt, Germany.

    It was launched on 17 March 2009 and came to and end of mission on 21 October 2013 because it ran out of propellant. As planned, the satellite began dropping out of orbit and made an uncontrolled re-entry on 11 November 2013

    Orbit

    GOCE used a Sun-synchronous orbit with an inclindation of 96.7 degree, a mean altitude of approximately 263 km, an orbital period of 90 minutes, and a mean local solar time at ascending node of 18:00.

    Resources

    • [Data Source](https://earth.esa....
  18. d

    Data from: Novel mitochondrial genome rearrangements including duplications...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bushra Fazal Minhas; Emily A. Beck; C.-H. Christina Cheng; Julian Catchen (2023). Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy could underlie temperature adaptations in Antarctic notothenioid fishes [Dataset]. http://doi.org/10.5061/dryad.9ghx3ffn0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset provided by
    Dryad
    Authors
    Bushra Fazal Minhas; Emily A. Beck; C.-H. Christina Cheng; Julian Catchen
    Time period covered
    Nov 28, 2023
    Description

    Data for "Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy could underlie temperature adaptations in Antarctic Notothenioid Fishes"

    Minhas BF, Beck EA, Cheng CC-H, Catchen, JM. (2022). Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy in Antarctic Notothenioid Fishes bioRxiv 2022.09.19.508608; doi: https://doi.org/10.1101/2022.09.19.508608

    Species

    Blackfin icefish

    Mitochondrial genome assembly and annotation for the white-blooded, Antarctic blackfin icefish (Chaenocephalus aceratus). Mt genome shows 3 tandemly duplicated ND6 copies and evidence of heteroplasmy.

    Pike Icefish

    Mitochondrial genome assembly and annotation for the white-blooded, secondarily temperate pike icefish (Champsocephalus aceratus). Mt genome shows 4 tandemly duplicated ND6 copies and evidence of heteroplasmy.

    Mackerel icefish

    Mitochondrial genome assembly and annotation for the white-blooded, ...

  19. Air quality - nitrogen dioxide - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Sep 22, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). Air quality - nitrogen dioxide - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/air-quality-nitrogen-dioxide
    Explore at:
    Dataset updated
    Sep 22, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Datasets showing nitrogen dioxide levels (NO2) at various locations around Leeds. Please note From 17/02/17 this dataset will be archived and superseeded by the ratified air quality dataset https://datamillnorth.org/dataset/ratified-air-quality---nitrogen-dioxide which contains corroborated data quality checked by external auditors. Additional information The data is collected on an hourly basis Column A = Date of collection (YYMMDD) Column B = Time of collection Column C = Reading Column D = Validation (14 means the data has been validated but not that it has been ratified) NOTE: The data is not necessarily collected for all dates/times/stations Defra air quality data Information manage a further two stations in Leeds You can be sent information through their air quality bulletin and request up to hourly information from http://uk-air.defra.gov.uk/bulletin-subscription Archive CSV data can be downloaded from http://uk-air.defra.gov.uk/data/data_selector?=l&1=&s=&o=#mid

  20. c

    Public Dataset Access and Usage

    • s.cnmilf.com
    • data.sfgov.org
    • +2more
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). Public Dataset Access and Usage [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/public-dataset-access-and-usage
    Explore at:
    Dataset updated
    Oct 4, 2025
    Dataset provided by
    data.sfgov.org
    Description

    A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc). B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process. C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL. D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal. Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ownerly (2021). C Street Cross Street Data in Reading, MA [Dataset]. https://www.ownerly.com/ma/reading/c-st-home-details

C Street Cross Street Data in Reading, MA

Explore at:
Dataset updated
Dec 6, 2021
Dataset authored and provided by
Ownerly
Area covered
Reading, Massachusetts
Description

This dataset provides information about the number of properties, residents, and average property values for C Street cross streets in Reading, MA.

Search
Clear search
Close search
Google apps
Main menu