Facebook
TwitterThis dataset provides information about the number of properties, residents, and average property values for C Street cross streets in Reading, MA.
Facebook
TwitterRace-C : additional data for race (high school/middle school) but for college level https://github.com/mrcdata/race-c @InProceedings{pmlr-v101-liang19a, title={A New Multi-choice Reading Comprehension Dataset for Curriculum Learning}, author={Liang, Yichan and Li, Jianheng and Yin, Jian}, booktitle={Proceedings of The Eleventh Asian Conference on Machine Learning}, pages={742--757}, year={2019} }
Facebook
TwitterNNDSS - TABLE 1Q. Hepatitis B, perinatal infection to Hepatitis C, acute, Probable - 2022. In this Table, provisional cases* of notifiable diseases are displayed for United States, U.S. territories, and Non-U.S. residents. Notes: • These are weekly cases of selected infectious national notifiable diseases, from the National Notifiable Diseases Surveillance System (NNDSS). NNDSS data reported by the 50 states, New York City, the District of Columbia, and the U.S. territories are collated and published weekly as numbered tables available at https://www.cdc.gov/nndss/data-statistics/index.html. Cases reported by state health departments to CDC for weekly publication are subject to ongoing revision of information and delayed reporting. Therefore, numbers listed in later weeks may reflect changes made to these counts as additional information becomes available. Case counts in the tables are presented as published each week. See also Guide to Interpreting Provisional and Finalized NNDSS Data at https://www.cdc.gov/nndss/docs/Readers-Guide-WONDER-Tables-20210421-508.pdf. • Notices, errata, and other notes are available in the Notice To Data Users page at https://wonder.cdc.gov/nndss/NTR.html. • The list of national notifiable infectious diseases and conditions and their national surveillance case definitions are available at https://ndc.services.cdc.gov/. This list incorporates the Council of State and Territorial Epidemiologists (CSTE) position statements approved by CSTE for national surveillance. Footnotes: *Case counts for reporting years 2021 and 2022 are provisional and subject to change. Cases are assigned to the reporting jurisdiction submitting the case to NNDSS, if the case's country of usual residence is the U.S., a U.S. territory, unknown, or null (i.e. country not reported); otherwise, the case is assigned to the 'Non-U.S. Residents' category. Country of usual residence is currently not reported by all jurisdictions or for all conditions. For further information on interpretation of these data, see https://www.cdc.gov/nndss/docs/Readers-Guide-WONDER-Tables-20210421-508.pdf. †Previous 52 week maximum and cumulative YTD are determined from periods of time when the condition was reportable in the jurisdiction (i.e., may be less than 52 weeks of data or incomplete YTD data). U: Unavailable — The reporting jurisdiction was unable to send the data to CDC or CDC was unable to process the data. -: No reported cases — The reporting jurisdiction did not submit any cases to CDC. N: Not reportable — The disease or condition was not reportable by law, statute, or regulation in the reporting jurisdiction. NN: Not nationally notifiable — This condition was not designated as being nationally notifiable. NP: Nationally notifiable but not published. NC: Not calculated — There is insufficient data available to support the calculation of this statistic. Cum: Cumulative year-to-date counts. Max: Maximum — Maximum case count during the previous 52 weeks.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
César E. Corona-González, Claudia Rebeca De Stefano-Ramos, Juan Pablo Rosado-Aíza, Fabiola R Gómez-Velázquez, David I. Ibarra-Zarate, Luz María Alonso-Valerdi
César E. Corona-González
https://orcid.org/0000-0002-7680-2953
a00833959@tec.mx
Psychophysiological data from Mexican children with learning difficulties who strengthen reading and math skills by assistive technology
2023
The current dataset consists of psychometric and electrophysiological data from children with reading or math learning difficulties. These data were collected to evaluate improvements in reading or math skills resulting from using an online learning method called Smartick.
The psychometric evaluations from children with reading difficulties encompassed: spelling tests, where 1) orthographic and 2) phonological errors were considered, 3) reading speed, expressed in words read per minute, and 4) reading comprehension, where multiple-choice questions were given to the children. The last 2 parameters were determined according to the standards from the Ministry of Public Education (Secretaría de Educación Pública in Spanish) in Mexico. On the other hand, group 2 assessments embraced: 1) an assessment of general mathematical knowledge, as well as 2) the hits percentage, and 3) reaction time from an arithmetical task. Additionally, selective attention and intelligence quotient (IQ) were also evaluated.
Then, individuals underwent an EEG experimental paradigm where two conditions were recorded: 1) a 3-minute eyes-open resting state and 2) performing either reading or mathematical activities. EEG recordings from the reading experiment consisted of reading a text aloud and then answering questions about the text. Alternatively, EEG recordings from the math experiment involved the solution of two blocks with 20 arithmetic operations (addition and subtraction). Subsequently, each child was randomly subcategorized as 1) the experimental group, who were asked to engage with Smartick for three months, and 2) the control group, who were not involved with the intervention. Once the 3-month period was over, every child was reassessed as described before.
The dataset contains a total of 76 subjects (sub-), where two study groups were assessed: 1) reading difficulties (R) and 2) math difficulties (M). Then, each individual was subcategorized as experimental subgroup (e), where children were compromised to engage with Smartick, or control subgroup (c), where they did not get involved with any intervention.
Every subject was followed up on for three months. During this period, each subject underwent two EEG sessions, representing the PRE-intervention (ses-1) and the POST-intervention (ses-2).
The EEG recordings from the reading difficulties group consisted of a resting state condition (run-1) and while performing active reading and reading comprehension activities (run-2). On the other hand, EEG data from the math difficulties group was collected from a resting state condition (run-1) and when solving two blocks of 20 arithmetic operations (run-2 and run-3). All EEG files were stored in .set format. The nomenclature and description from filenames are shown below:
| Nomenclature | Description |
|---|---|
| sub- | Subject |
| M | Math group |
| R | Reading group |
| c | Control subgroup |
| e | Experimental subgroup |
| ses-1 | PRE-intervention |
| ses-2 | POST-Intervention |
| run-1 | EEG for baseline |
| run-2 | EEG for reading activity, or the first block of math |
| run-3 | EEG for the second block of math |
Example: the file sub-Rc11_ses-1_task-SmartickDataset_run-2_eeg.set is related to: - The 11th subject from the reading difficulties group, control subgroup (sub-Rc11). - EEG recording from the PRE-intervention (ses-1) while performing the reading activity (run-2)
Psychometric data from the reading difficulties group:
Psychometric data from the math difficulties group:
Psychometric data can be found in the 01_Psychometric_Data.xlsx file
Engagement percentage be found in the 05_SessionEngagement.xlsx file
Seventy-six Mexican children between 7 and 13 years old were enrolled in this study.
The sample was recruited through non-profit foundations that support learning and foster care programs.
g.USBamp RESEARCH amplifier
The stimuli nested folder contains all stimuli employed in the EEG experiments.
Level 1 - Math: Images used in the math experiment. - Reading: Images used in the reading experiment.
Level 2
- Math
* POST_Operations: arithmetic operations from the POST-intervention.
* PRE_Operations: arithmetic operations from the PRE-intervention.
- Reading
* POST_Reading1: text 1 and text-related comprehension questions from the POST-intervention.
* POST_Reading2: text 2 and text-related comprehension questions from the POST-intervention.
* POST_Reading3: text 3 and text-related comprehension questions from the POST-intervention.
* PRE_Reading1: text 1 and text-related comprehension questions from the PRE-intervention.
* PRE_Reading2: text 2 and text-related comprehension questions from the PRE-intervention.
* PRE_Reading3: text 3 and text-related comprehension questions from the PRE-intervention.
Level 3 - Math * Operation01.jpg to Operation20.jpg: arithmetical operations solved during the first block of the math
Facebook
Twitterimport cv2 original_image = cv2.imread('Original image/IMG-001.png') # Read original image ground_truth_image = cv2.imread('Ground truth/GT-001.png', cv2.IMREAD_GRAYSCALE) # Read the corresponding Ground Truth image When performing model training based on deep learning frameworks (such as TensorFlow, PyTorch), the dataset path can be configured into the corresponding dataset loading class according to the data loading mechanism of the framework to ensure that the model can correctly read and process the image and its annotation data.
References: [1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368
[2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).
[3] Classification of Brain Hemorrhage Using Deep Learning from CT Scan Images - https://link.springer.com/chapter/10.1007/978-981-19-7528-8_15
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LifeSnaps Dataset Documentation
Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.
The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.
Data Import: Reading CSV
For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.
Data Import: Setting up a MongoDB (Recommended)
To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.
To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.
For the Fitbit data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
For the SEMA data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c sema
For surveys data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c surveys
If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.
Data Availability
The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:
{
_id:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is an extensive eye tracking dataset of 102 participants reading three Dutch literary short stories each (7790 words per participant). The preprocessed data set includes (1) Fixation report (fixation-level), (2) Saccade report, (3) Interest Area report (word-level), (4) Trial report (aggregated data for each page; stories were split up into 30 pages each), (5) Sample report (the data were sampled at 500 Hz, this report includes data on all individual samples), (6) Questionnaire data on reading experiences and other participant characteristics, and (7) word characteristics for all words in the stories (with the potential of calculating additional word characteristics).The study for which this data set was collected explored the effect of simulation on reading behavior by means of eye tracking. We hypothesized (A) that simulation would lead to longer fixation times for parts of the text high in simulation-eliciting content. Additionally, we hypothesized (B) that we would find personal preferences in the reaction to different types of simulation-eliciting content. We expected (C) that the findings from the eye tracking data would be related to self-report of simulation. Finally, we expected (D) that the amount of simulation would be predictive of self-report of appreciation. We found (A) longer reading times for perceptual and mental event simulation, but shorter reading times for motor simulation. The strength of the relationship between simulation and reading times varied between participants, but was positively correlated across the different types of simulation-eliciting content (B). Regarding (C) and (D), we found that this variation in the strength of the relationship between simulation and reading times was indeed related to aspects of self-reported simulation, absorption and appreciation.The findings from this study are described in detail in Mak & Willems (2019). https://doi.org/10.1080/23273798.2018.1552007
Facebook
TwitterdataPOEM.csv The dataPOEM.csv data set contains data on the level of each poem. scoresAes = factor scores of moving, beauty, and melodious ratings. participant = participant number poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) poemIdentity = poem number avgWFreq = average word frequency of poem totalGazeSlopeLineLength totalGazeWordMeanNAByWordLen totalGazeWordMeanNADiff order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) firstFixDurMS_MINFIX_AVG = first fixation duration totalGazeMS_MINFIX_AVG = total gaze durations fixDurMS_MINFIX_NUM = number of fixations sacLenMS_MINFIX_AVG = average saccade length percRegMS_MINFIX_AVG = percentage of regressive eye movements pupilDial_AVG = average pupil dilation blink_NUM_TotalRT = number of blinks relative to total reading time totalReadingTime = total reading time of the poem areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem moving = rating of how moving the poem was beauty = rating of how beautiful the poem was melodious = rating of how melodious the poem was dataROI.csv The dataROI.csv data set contains data on the level of each line within a poem. order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) participant = participant number poemIdentity = poem number lineNr = line number within poem poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza) BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme) lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem) totalGazeByWordNA = total gaze duration of final word of a line relative to word length gazeByLineLengthNA = total gaze duration of a line relative to line length dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2024 and single recent year data pertain to citations received during calendar year 2024. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2025 snapshot from Scopus, updated to end of citation year 2024. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2025. If an author is not on the list, it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scripts to read the data into Matlab are available here:https://github.com/myankov/EDFA-data-reading-scripts/The datasets contains metadata, such as:1) unique ID of PSD profile2) unique ID of physical amplifier3) Total input and output power recordings of each EDFA4) unique ID of the multi-span system, indicating the order of the EDFAs, as well as the fiber span lengthsAs well as PSDs readings from an OSA for the input PSD and the output PSD at the OSA wavelengths.
Facebook
TwitterFor any b2z file, It is recommend to be parallel bzip decompressor (https://github.com/mxmlnkn/indexed_bzip2) for speed.
In summary:
See forum discussion for details of [1],[2]: https://www.kaggle.com/competitions/leash-BELKA/discussion/492846
This is somehow obsolete as the competition progresses. ecfp6 gives better results and can be extracted fast with scikit-fingerprints.
See forum discussion for details of [3]: https://www.kaggle.com/competitions/leash-BELKA/discussion/498858 https://www.kaggle.com/code/hengck23/lb6-02-graph-nn-example
See forum discussion for details of [4]: https://www.kaggle.com/competitions/leash-BELKA/discussion/505985 https://www.kaggle.com/code/hengck23/conforge-open-source-conformer-generator
Facebook
TwitterIn an apparatus for the preparation of a plurality of drinks from cartridges (K) that are provided with an optical code (C) on one of its faces (F) identifying the cartridge (K) itself and the corresponding drink, the reading of the optical code (C) is made more certain and reliable thanks to a projecting reading window (40).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Whole genome sequencing was performed on C. elegans strains with different rDNA copy numbers.
CB3740_eDp20_WS235_sort.bam – Aligned whole genome sequence data for C. elegans strain CB3740 (eDf24 I; eDp20 (I,II); mnT12 (IV,X))
CB3740_eDP20_WS235_sort_depth.txt – Read depth analysis file for whole genome sequencing of C. elegans strain CB3740 (eDf24 I; eDp20 (I,II); mnT12 (IV,X))
eDP20_chrI_13mil.txt – Read depth data for right arm of ChrI of C. elegans strain CB3740
N2_chrI_13mil.txt – Read depth data for right arm of ChrI of C. elegans wild type strain N2
SEA296_MY1_130E_chrI_merge_RG.bam – Aligned whole genome sequence data for C. elegans strain SEA296 (mIs13[myo-2p::GFP + pes-10p::GFP + F22B7.9p::GFP] I, catIR8[I:, N2>MY1]). Homozygous for 64-copy rDNA array.
SEA296_MY1_130E_merge.g.vcf – VCF for sequence variants in C. elegans strain SEA296 (mIs13[myo-2p::GFP + pes-10p::GFP + F22B7.9p::GFP] I, catIR8[I:, N2>MY1]). Homozygous for 64-copy rDNA array.
SEA300_duprm_RG.bam – Aligned whole genome sequence data for C. elegans strain SEA300 (catIR12[I:?-end , MY1>N2]). Homozygous for 417-copy rDNA array.
SEA300_duprm_RG.g.vcf – VCF for sequence variants in C. elegans strain SEA300 (catIR12[I:?-end , MY1>N2]). Homozygous for 417-copy rDNA array.
SEA302_S2_WS230_duprumRG.bam – Aligned whole genome sequence data for C. elegans strain SEA302 (catIR14[I:~13500000-end, JU775>N2]). Homozygous for 81-copy rDNA array.
SEA302_S2_WS230_duprmRG.g.vcf – VCF for sequence variants in C. elegans strain SEA302 (catIR14[I:~13500000-end, JU775>N2]). Homozygous for 81-copy rDNA array.
SEA305_S5_WS230_duprmRG.bam – Aligned whole genome sequence data for C. elegans strain SEA305 (catIR17[I:~3600000-end, MY16>N2]) Homozygous for 73-copy rDNA array.
SEA305_S5_WS230_duprmRG.g.vcf – VCF for sequence variants in C. elegans strain SEA305 (catIR17[I:~3600000-end, MY16>N2]) Homozygous for 73-copy rDNA array.
Facebook
TwitterI have created this dataset for an easier way to analyse the progression of answers from the respondents that are participating each year in the very famous Data Science Kaggle Survey.
The sources of the present data are: * 2017: https://www.kaggle.com/kaggle/kaggle-survey-2017 * 2018: https://www.kaggle.com/kaggle/kaggle-survey-2018 * 2019: https://www.kaggle.com/c/kaggle-survey-2019/data * 2020: https://www.kaggle.com/c/kaggle-survey-2020/data * 2021: https://www.kaggle.com/c/kaggle-survey-2021/data
This dataset was created by manually aggregating each of the 5 tables mentioned above. The full methodology was as follows:
The aggregation was done manually, as the questions order, naming and types of answers differ from one year to another. Hence, the most accurate way (although not the most efficient), was to read, order and pick the questions with regards to the base table (which was the 2021 Survey).
This dataset contains the following:
kaggle_survey_2017_2021.csv: the tabular dataset containing the aggregated data from 2017 to 2021.style.css: a file that serves as custom styling for my notebook on this competition.images folder: all images I have used for my notebook on this competition.Note: Notebook can be found here.
Thank you so much to the Kaggle Team for hosting these surveys and sharing with us all the data, so we can take the pulse of the community each year.
The Kaggle Survey is reach in information as is, but what can you find by adding another layer of information - the year? Evolutions in time could be fascinating.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information
The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.
The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.
Structure and content of the dataset
|
ChEMBL ID |
PubChem ID |
IUPHAR ID | Target |
Activity type | Assay type | Unit | Mean C (0) | ... | Mean PC (0) | ... | Mean B (0) | ... | Mean I (0) | ... | Mean PD (0) | ... | Activity check annotation | Ligand names | Canonical SMILES C | ... | Structure check | Source |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.
Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.
Column content:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
UPDATE 1/7/2025: On June 28th 2023, the San Francisco Police Department (SFPD) changed its Stops Data Collection System (SDCS). As a result of this change, record identifiers have changed from the Department of Justice (DOJ) identifier to an internal record numbering system (referred to as "LEA Record ID"). The data that SFPD uploads to the DOJ system will contain the internal record number which can be used for joins with the data available on DataSF.
A. SUMMARY The San Francisco Police Department (SFPD) Stop Data was designed to capture information to comply with the Racial and Identity Profiling Act (RIPA), or California Assembly Bill (AB)953. SFPD officers collect specific information on each stop, including elements of the stop, circumstances and the perceived identity characteristics of the individual(s) stopped. The information obtained by officers is reported to the California Department of Justice. This dataset includes data on stops starting on July 1st, 2018, which is when the data collection program went into effect. Read the detailed overview for this dataset here.
B. HOW THE DATASET IS CREATED By the end of each shift, officers enter all stop data into the Stop Data Collection System, which is automatically submitted to the California Department of Justice (CA DOJ). Once a quarter the Department receives a stops data file from CA DOJ. The SFPD conducts several transformations of this data to ensure privacy, accuracy and compliance with State law and regulation. For increased usability, text descriptions have also been added for several data fields which include numeric codes (including traffic, suspicion, citation, and custodial arrest offense codes, and actions taken as a result of a stop). See the data dictionaries below for explanations of all coded data fields. Read more about the data collection, and transformation, including geocoding and PII cleaning processes, in the detailed overview of this dataset.
C. UPDATE PROCESS Information is updated on a quarterly basis.
D. HOW TO USE THIS DATASET This dataset includes information about police stops that occurred, including some details about the person(s) stopped, and what happened during the stop. Each row is a person stopped with a record identifier for the stop and a unique identifier for the person. A single stop may involve multiple people and may produce more than one associated unique identifier for the same record identifier. A certain percentage of stops have stop information that can’t be geocoded. This may be due to errors in data input at the officer level (typos in entry or providing an address that doesn't exist). More often, this is due to officers providing a level of detail that isn't codable to a geographic coordinate - most often at the Airport (ie: Terminal 3, door 22.) In these cases, the location of the stops is coded as unknown.
E. DATA DICTIONARIES CJIS Offense Codes data look up table
Facebook
TwitterUtilisation of this data is subject to European Space Agency's Earth Observation Terms and Conditions. Read T&C here
This is Dataset Version 3 - Updates may be done following feedback from the machine learning community.
This dataset contains 327 time series corresponding to the temporal values of 327 telemetry parameters over the life of the real GOCE satellite (from March 2009 to October 2013). It consists both the raw data and Machine-Learning ready-to-use resampled data:
- The raw values (calibrated values of each parameter) as {param}_raw.parquet files (irregular)
- Resampled and popular statistics computed over 10-minutes windows for each parameter as {param}_stats_10min.parquet files.
- Resampled and popular statistics computed over 6-hours windows for each parameter as {param}_stats_6h.parquet
- metadata.csv list of all parameters with description, subsystem, first and last timestamp where a value is recorded, fraction of NaN in the calculated statistics and the longest data gap.
- mass_properties.csv: provides information relative to the satellite mass (for example the remaining fuel on-board).
The Gravity Field and Steady-State Ocean Circulation Explorer (GOCE; pronounced ‘go-chay’), is a scientific mission satellite from the European Space Agency (ESA).
GOCE's primary mission objective was to provide an accurate and detailed global model of Earth's gravity field and geoid. For this purpose, it is equipped with a state-of-the-art Gravity Gradiometer and precise tracking system.
The satellite's main payload was the Electrostatic Gravity Gradiometer (EGG) to measure the gravity field of Earth. Other payload was an onboard GPS receiver used as a Satellite-to-Satellite Tracking Instrument (SSTI); a compensation system for all non-gravitational forces acting on the spacecraft. The satellite was also equipped with a laser retroreflector to enable tracking by ground-based Satellite laser ranging station.
The satellite's unique arrow shape and fins helped keep GOCE stable as it flew through the thermosphere at a comparatively low altitude of 255 kilometres (158 mi). Additionally, an ion propulsion system continuously compensated for the variable deceleration due to air drag without the vibration of a conventional chemically powered rocket engine, thus limiting the errors in gravity gradient measurements caused by non-gravitational forces and restoring the path of the craft as closely as possible to a purely inertial trajectory.
Due to the orbit and satellite configuration, the solar panels experienced extreme temperature variations. The design therefore had to include materials that could tolerate temperatures as high as 160 degC and as low as -170 degC.
Due to its stringent temperature stability requirements (for the gradiometer sensor heads, in the range of milli-Kelvin) the gradiometer was thermally decoupled from the satellite and had its own dedicated thermal-control system.
Flight operations were conducted from the European Space Operations Centre, based in Darmstadt, Germany.
It was launched on 17 March 2009 and came to and end of mission on 21 October 2013 because it ran out of propellant. As planned, the satellite began dropping out of orbit and made an uncontrolled re-entry on 11 November 2013
GOCE used a Sun-synchronous orbit with an inclindation of 96.7 degree, a mean altitude of approximately 263 km, an orbital period of 90 minutes, and a mean local solar time at ascending node of 18:00.
Facebook
TwitterMinhas BF, Beck EA, Cheng CC-H, Catchen, JM. (2022). Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy in Antarctic Notothenioid Fishes bioRxiv 2022.09.19.508608; doi: https://doi.org/10.1101/2022.09.19.508608
Mitochondrial genome assembly and annotation for the white-blooded, Antarctic blackfin icefish (Chaenocephalus aceratus). Mt genome shows 3 tandemly duplicated ND6 copies and evidence of heteroplasmy.
Mitochondrial genome assembly and annotation for the white-blooded, secondarily temperate pike icefish (Champsocephalus aceratus). Mt genome shows 4 tandemly duplicated ND6 copies and evidence of heteroplasmy.
Mitochondrial genome assembly and annotation for the white-blooded, ...
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Datasets showing nitrogen dioxide levels (NO2) at various locations around Leeds. Please note From 17/02/17 this dataset will be archived and superseeded by the ratified air quality dataset https://datamillnorth.org/dataset/ratified-air-quality---nitrogen-dioxide which contains corroborated data quality checked by external auditors. Additional information The data is collected on an hourly basis Column A = Date of collection (YYMMDD) Column B = Time of collection Column C = Reading Column D = Validation (14 means the data has been validated but not that it has been ratified) NOTE: The data is not necessarily collected for all dates/times/stations Defra air quality data Information manage a further two stations in Leeds You can be sent information through their air quality bulletin and request up to hourly information from http://uk-air.defra.gov.uk/bulletin-subscription Archive CSV data can be downloaded from http://uk-air.defra.gov.uk/data/data_selector?=l&1=&s=&o=#mid
Facebook
TwitterA. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc). B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process. C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL. D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal. Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.
Facebook
TwitterThis dataset provides information about the number of properties, residents, and average property values for C Street cross streets in Reading, MA.