Facebook
TwitterThis dataset contains the predicted prices of the asset Macrodata Refinement over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Pile -- NIHExPorter (refined by Data-Juicer)
A refined version of NIHExPorter dataset in The Pile by Data-Juicer. Removing some "bad" samples from the original dataset to make it higher-quality. This dataset is usually used to pretrain a Large Language Model. Notice: Here is a small subset for previewing. The whole dataset is available here (About 2.0G).
Dataset Information
Number of samples: 858,492 (Keep ~91.36% from the original dataset)
Refining… See the full description on the dataset page: https://huggingface.co/datasets/datajuicer/the-pile-nih-refined-by-data-juicer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aRsym = ∑|Ii – |/|Ii| where Ii is the intensity of the ith measurement, and is the mean intensity for that reflection.bReflections with I>σ was used in the refinement.cRwork = |Fobs – Fcalc|/|Fobs| where Fcalc and Fobs are the calculated and observed structure factor amplitudes, respectively.dRfree = as for Rwork, but for 5–7% of the total reflections chosen at random and omitted from refinement.eIndividual B-factor refinements were calculated.*The high resolution bin details are in the parenthesis.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
RedPajama -- ArXiv (refined by Data-Juicer)
A refined version of ArXiv dataset in RedPajama by Data-Juicer. Removing some "bad" samples from the original dataset to make it higher-quality. This dataset is usually used to pretrain a Large Language Model. Notice: Here is a small subset for previewing. The whole dataset is available here (About 85GB).
Dataset Information
Number of samples: 1,655,259 (Keep ~95.99% from the original dataset)
Refining Recipe… See the full description on the dataset page: https://huggingface.co/datasets/datajuicer/redpajama-arxiv-refined-by-data-juicer.
Facebook
TwitterThis worksheet displays the results of mineral abundance estimates based on Rietveld refinement of X-ray diffraction (XRD) analyses of mill tailings and other ore processing materials from worldwide localities. Data are also provided to show variation in mineral abundance estimates for subsplits in individual samples. Samples were analyzed using a PANalytical X'Pert Pro diffractometer using Cu radiation and the results interpreted using Highscore Plus v.4.7.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aData in parenthesis pertain to the highest resolution shell (2.0 Å-1.9 Å).bRint = ∑|I - |/∑I, where I is the observed intensity of a measured reflection and is the mean intensity for all observation of symmetry-related reflections.cR factor = Σ |Foh – Fch|/Σ Foh, where Foh and Fch are the observed and calculated structure factor amplitudes for the 32,658 reflections h that were used in structure refinement.dR free = Σ |Foh – Fch|/Σ Foh, where Foh and Fch are the observed and calculated structure factor amplitudes pertaining to the 2,070 reflections h that were not used in structure refinement.
Facebook
TwitterValues in parentheses refer to the highest resolution shell.aRmerge=∑hkl∑i|Ii(hkl)−⟨Ihkl⟩|/∑hkl∑i⟨Ihkl⟩bRp.i.m.=∑hkl[1/(N−1)]1/2∑i|Ii(hkl)−⟨Ihkl⟩|/∑hkl∑iIi(hkl)cRwork=∑||Fobs|−|Fcalc||/∑|Fobs| where Fobs and Fcalc are observed and calculated structure factors, respectively. Rfree correspond to a subset of 5% of reflections randomly selected omitted during refinement.d Others refer to the Cl- ion present in NUDT16 IMP-bound structure.e Values determined by MolProbity [39].Data collection and refinement statistics.
Facebook
Twitterbarneylogo/refined-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aNumber in parentheses indicate the outer-resolution shell.bRmerge = ∑hkl ∑i |Ii (hkl) - 〈I (hkl) 〉|/∑hkl ∑i Ii (hkl), where Ii(hkl) is the ith observation of reflection hkl and 〈I (hkl) 〉 is the weighted average intensity for all observations i of reflection hkl.cRcryst = Σhkl = ∑hkl|Fobs − Fcalc|/Σhkl |Fobs|.dRfree is the same as Rcryst except for 5% of the data excluded from the refinement.eSum of the TLS and Residual B-factor contributions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aValues in parentheses apply to the high-resolution shell.b; Nh, multiplicity for each reflection; Ii, the intensity of the ith observation of reflection h; , the mean of the intensity of all observations of reflection h, with ; is taken over all reflections; is taken over all observations of each reflection.c; ; Rcryst and Rfree were calculated using the working and test hkl reflection sets, respectively.dTotal refined protein residues equal 3172, from which 28 terminal amino acids (the N- and C-termini on the 9 chains; plus residues: TS#399, TS#409 (in chains A, B & C), Fab#27, Fab#29 (in chain H), Fab#137, Fab#139 (in chain I), all flanking unmodeled gaps) were not included in the Ramachandran analysis (as implemented in Coot v 0.6.2-pre-1).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rmerge = ΣhΣi |Ih,i−Ih|/ΣhΣiIh,i for the intensity (I) of i observation of reflection h. R factor = Σ||Fobs|−|Fcalc||/Σ|Fobs|, where Fobs and Fcalc are the observed and calculated structure factors, respectively. Rfree = R factor calculated using 5% of the reflection data chosen randomly and omitted from the start of refinement. Rmsd, root-mean-square deviations from ideal geometry. Data for the highest resolution shell are shown in parentheses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterThese environmental DNA data and corresponding water quality data were collected and analyzed by the Fish and Wildlife Service in 2017. The samples were collected from 4 sites in pools 17 and 18 in the Upper Mississippi River on 3 sampling trips. The data was used to study occupancy modeling of eDNA data and determine optimal sampling effort required for reliable detection of invasive Bighead Carp and Silver Carp in streams with similar attributes at the Mississippi River.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TechnicalRemarks: This repository contains the supplementary data to our contribution "Particle Detection by means of Neural Networks and Synthetic Training Data Refinement in Defocusing Particle Tracking Velocimetry" to the 2022 Measurement Science and Technology special issue on the topic “Machine Learning and Data Assimilation techniques for fluid flow measurements”. This data includes annotated images used for the training of neural networks for particle detection on DPTV recordings as well as unannotated particle images used for training of the image-to-image translation networks for the generation of refined synthetic training data, as presented in the manuscript. The neural networks for particle detection trained on the aforementioned data are contained in this repository as well. An explanation on the use of this data and the trained neural networks, containing an example script can be found on GitHub (https://github.com/MaxDreisbach/DPTV_ML_Particle_detection)
Facebook
TwitterBacterial antibiotic resistance remains an ever-increasing worldwide problem, requiring new approaches and enzyme targets. Acinetobacter baumannii is recognised as one of the most significant antibiotic-resistant bacteria, capable of carrying up to 45 different resistance genes, and new drug discovery targets for this organism is an urgent priority. Short-chain dehydrogenase/reductase enzymes are a large protein family with >60,000 members involved in numerous biosynthesis pathways. Here, we determined the structure of an SDR protein from A. baumannii and assessed the putative co-factor comparisons with previously co-crystalised enzymes and cofactors. This study provides a basis for future studies to examine these potential co-factors in vitro.
Facebook
TwitterData on petroleum inputs, production, yield, and capacity. Weekly, monthly and annual data available. Users of the EIA API are required to obtain an API Key via this registration form: http://www.eia.gov/beta/api/register.cfm
Facebook
TwitterA variety of datasets for analysis of High Wealth individuals to assist HMRC's High Net Worth Unit in maintaining and refining its population. Matches 10 years of Inheritance Tax Data to the relevant in-life SA data. Updated: ad hoc.
Facebook
TwitterThis data set contains files included in the detailed instructional demonstration paper submitted to Integrating Materials and Manufacturing Innovation. The detailed instructional demonstration paper includes documentation detailing how to configure and carry out a repeatable Rietveld Refinement with the software MAUD. The data set provides: diffraction data from two different neutron diffraction measurements, crystallographic information files, and configuration files for the refinement process. The authors provide this data set to enable new users of MAUD a better user experience, and provide a series of training opportunities to ensure users of MAUD understand how the software operates beyond treating it as a black box.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Pile -- PubMed Central (refined by Data-Juicer)
A refined version of PubMed Central dataset in The Pile by Data-Juicer. Removing some "bad" samples from the original dataset to make it higher-quality. This dataset is usually used to pretrain a Large Language Model. Notice: Here is a small subset for previewing. The whole dataset is available here (About 83G).
Dataset Information
Number of samples: 2,694,860 (Keep ~86.96% from the original dataset)… See the full description on the dataset page: https://huggingface.co/datasets/datajuicer/the-pile-pubmed-central-refined-by-data-juicer.
Facebook
TwitterADAPTIVE MODEL REFINEMENT FOR THE IONOSPHERE AND THERMOSPHERE ANTHONY M. D’AMATO∗, AARON J. RIDLEY∗∗, AND DENNIS S. BERNSTEIN∗∗∗ Abstract. Mathematical models of physical phenomena are of critical importance in virtually all applications of science and technology. This paper addresses the problem of how to use data to improve the fidelity of a given model. We approach this problem using retrospective cost optimization, a novel technique that uses data to recursively update an unknown subsystem interconnected to a known system. Applications of this research are relevant to a wide range of applications that depend on large-scale models based on firstprinciples physics, such as the Global Ionosphere-Thermosphere Model (GITM). Using GITM as the truth model, we demonstrate that measurements can be used to identify unknown physics. Specifically, we estimate static thermal conductivity parameters, and we identify a dynamic cooling process.
Facebook
TwitterThis dataset contains the predicted prices of the asset Macrodata Refinement over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.