35 datasets found

f
Data from: Time-Split Cross-Validation as a Method for Estimating the...
acs.figshare.com
txt
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert P. Sheridan (2023). Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. [Dataset]. http://doi.org/10.1021/ci400084k.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1021/ci400084k.s001
Dataset updated
Jun 2, 2023
Dataset provided by
ACS Publications
Authors
Robert P. Sheridan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.
R
Replication data for: "Split Decisions: Household Finance When a Policy...
dataverse.iza.org
dataverse.harvard.edu
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael A. Clemens; Michael A. Clemens; Erwin R. Tiongson; Erwin R. Tiongson (2024). Replication data for: "Split Decisions: Household Finance When a Policy Discontinuity Allocates Overseas Work" [Dataset]. http://doi.org/10.7910/DVN/2DO8QP
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/2DO8QP
Dataset updated
Jul 11, 2024
Dataset provided by
Research Data Center of IZA (IDSC)
Authors
Michael A. Clemens; Michael A. Clemens; Erwin R. Tiongson; Erwin R. Tiongson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Clemens, Michael A., and Tiongson, Erwin R., (2017) "Split Decisions: Household Finance When a Policy Discontinuity Allocates Overseas Work." Review of Economics and Statistics 99:3, 531-543.
d
Data from: Mixed-strain housing for female C57BL/6, DBA/2, and BALB/c mice:...
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mason, Georgia; Walker, Michael (2023). Mixed-strain housing for female C57BL/6, DBA/2, and BALB/c mice: Validating a split-plot design that promotes refinement and reduction [Dataset]. https://search.dataone.org/view/sha256%3A2b1ace7be31b90c0a2cf6859c8ec9dc108595d64d1ead30a0bfe0477100a52a8
Explore at:
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Mason, Georgia; Walker, Michael
Time period covered
May 1, 2013 - Aug 1, 2013
Description
Validating a novel housing method for inbred mice: mixed-strain housing. To see if this housing method affected strain-typical mouse phenotypes, if variance in the data was affected, and how statistical power was increased through this split-plot design.
Val split & vocab file
kaggle.com
zip
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devi Hemamalini R (2024). Val split & vocab file [Dataset]. https://www.kaggle.com/datasets/devihemamalinir/val-split-and-vocab-file
Explore at:
zip(1603266139 bytes)Available download formats
Dataset updated
Jul 6, 2024
Authors
Devi Hemamalini R
Description
Dataset

This dataset was created by Devi Hemamalini R

Contents
Data from: Regression with Empirical Variable Selection: Description of a...
plos.figshare.com
txt
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anne E. Goodenough; Adam G. Hart; Richard Stafford (2023). Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets [Dataset]. http://doi.org/10.1371/journal.pone.0034338
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0034338
Dataset updated
Jun 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Anne E. Goodenough; Adam G. Hart; Richard Stafford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.
h
haoranxu_ALMA-13B-R-details
huggingface.co
Updated Jul 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open LLM Leaderboard (2025). haoranxu_ALMA-13B-R-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/haoranxu_ALMA-13B-R-details
Explore at:
Dataset updated
Jul 30, 2025
Dataset authored and provided by
Open LLM Leaderboard
Description
Dataset Card for Evaluation run of haoranxu/ALMA-13B-R

Dataset automatically created during the evaluation run of model haoranxu/ALMA-13B-R The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/haoranxu_ALMA-13B-R-details.
d
Data from: FFT-split-operator code for solving the Dirac equation in 2+1...
elsevier.digitalcommonsdata.com
Updated Jun 1, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guido R. Mocken (2008). FFT-split-operator code for solving the Dirac equation in 2+1 dimensions [Dataset]. http://doi.org/10.17632/43v3vvkwwf.1
Explore at:
Unique identifier
https://doi.org/10.17632/43v3vvkwwf.1
Dataset updated
Jun 1, 2008
Authors
Guido R. Mocken
License
https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Description
Abstract The main part of the code presented in this work represents an implementation of the split-operator method [J.A. Fleck, J.R. Morris, M.D. Feit, Appl. Phys. 10 (1976) 129-160; R. Heather, Comput. Phys. Comm. 63 (1991) 446] for calculating the time-evolution of Dirac wave functions. It allows to study the dynamics of electronic Dirac wave packets under the influence of any number of laser pulses and its interaction with any number of charged ion potentials. The initial wave function can be eith...

Title of program: Dirac++ or (abbreviated) d++ Catalogue Id: AEAS_v1_0

Nature of problem The relativistic time evolution of wave functions according to the Dirac equation is a challenging numerical task. Especially for an electron in the presence of high intensity laser beams and/or highly charged ions, this type of problem is of considerable interest to atomic physicists.

Versions of this program held in the CPC repository in Mendeley Data AEAS_v1_0; Dirac++ or (abbreviated) d++; 10.1016/j.cpc.2008.01.042

This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
h
details_CohereLabs_c4ai-command-r-plus-08-2024_private
huggingface.co
Updated May 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sasha Luccioni (2025). details_CohereLabs_c4ai-command-r-plus-08-2024_private [Dataset]. https://huggingface.co/datasets/sasha/details_CohereLabs_c4ai-command-r-plus-08-2024_private
Explore at:
Dataset updated
May 23, 2025
Authors
Sasha Luccioni
Description
Dataset Card for Evaluation run of CohereLabs/c4ai-command-r-plus-08-2024

Dataset automatically created during the evaluation run of model CohereLabs/c4ai-command-r-plus-08-2024. The dataset is composed of 3 configuration, each one corresponding to one of the evaluated task. The dataset has been created from 3 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the… See the full description on the dataset page: https://huggingface.co/datasets/sasha/details_CohereLabs_c4ai-command-r-plus-08-2024_private.
Z
Full-length and split homologs of human proteins in the gut microbiome
data.niaid.nih.gov
Updated Nov 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rendina, Matthew; Turnbaugh, Peter; Bradley, Patrick (2024). Full-length and split homologs of human proteins in the gut microbiome [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14037044
Explore at:
Dataset updated
Nov 5, 2024
Dataset provided by
University of California San Francisco
The Ohio State University
Authors
Rendina, Matthew; Turnbaugh, Peter; Bradley, Patrick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These files were generated as part of the manuscript "Human xenobiotic metabolism proteins have full-length and split homologs in the gut microbiome" (submitted).

The .ipc files are tables of full-length (full_humcover3.ipc) and split homologs (part_humcover3.ipc) of human proteins in the gut microbiome. Note that our pipeline collapses full-length alignments to the same UHGP-90 protein family into a single entry per species, with the number of genomes reported in the column nGenomes. Split homologs are not collapsed because genomic context is used to define them, and this context may differ across individual genomes.

These files are in Arrow IPC format, which provides compression and fast I/O for large tables. We recommend reading them using pola.rs or the R Arrow package. In particular, because the full-length homolog table is large, you may wish to work with it without loading it into memory, which can be accomplished using scan_ipc in pola.rs or open_dataset in R Arrow.

We also provide gzipped .csv format datasets of full-length (pgkb_FH_drugs.csv.gz) and split (pgkb_SH_drugs.csv.gz) homologs organized by their PharmGKB annotations. For each drug annotated in PharmGKB as being metabolized by a human protein with full-length or split homologs, we provide the human protein(s) responsible, its xenobiotic enzyme class, the bacterial protein homolog(s), length and percent identity of the alignment, and either the specific genome (g, split homologs only) or the number of genomes (nGenomes, full homologs only). Xenobiotic enzyme classes are defined as in Figure 3 of the manuscript, with the additional classes "nucl" (nucleobase-containing metabolic proteins not annotated to any other class), "redox" (oxidoreductases not annotated to any other class), and "other" (all remaining proteins).
d
Data from: Split-beam Echo Sounder and Navigation Data Collected Using a...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Split-beam Echo Sounder and Navigation Data Collected Using a Simrad EK80 Wide Band Transceiver and ES38-10 Transducer During the Mid-Atlantic Resource Imaging Experiment (MATRIX), USGS Field Activity 2018-002-FA [Dataset]. https://catalog.data.gov/dataset/split-beam-echo-sounder-and-navigation-data-collected-using-a-simrad-ek80-wide-band-transc
Explore at:
Dataset updated
Nov 12, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
In summer 2018, the U.S. Geological Survey partnered with the U.S Department of Energy and the Bureau of Ocean Energy Management to conduct the Mid-Atlantic Resources Imaging Experiment (MATRIX) as part of the U.S. Geological Survey Gas Hydrates Project. The field program objectives were to acquire high-resolution 2-dimensional multichannel seismic-reflection and split-beam echo sounder data along the U.S Atlantic margin between North Carolina and New Jersey to determine the distribution of methane gas hydrates in below-sea floor sediments and investigate potential connections between gas hydrate dynamics and sea floor methane seepage. MATRIX field work was carried out between August 8 and August 28, 2018 on the research vessel Hugh R. Sharp and resulted in acquisition of more than 2,000 track-line kilometers of multichannel seismic-reflection and co-located split-beam echo sounder data, along with wide-angle seismic reflection and refraction data from 63 expendable sonobuoy deployments.
f
The Pearson correlation coefficients (r) of diversity measures based on...
datasetcatalog.nlm.nih.gov
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhari, Niloufar; Tupper, Paul; Mooers, Arne; Colijn, Caroline (2024). The Pearson correlation coefficients (r) of diversity measures based on heterozygosity and split system diversity applied on subsets of Atlantic salmon populations with size k = 2, 3, and 4. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001353332
Explore at:
Dataset updated
Dec 4, 2024
Authors
Abhari, Niloufar; Tupper, Paul; Mooers, Arne; Colijn, Caroline
Description
The Pearson correlation coefficients (r) of diversity measures based on heterozygosity and split system diversity applied on subsets of Atlantic salmon populations with size k = 2, 3, and 4.
a
Data from: 15 3 2
chatham-county-planning-subdivisions-and-rezonings-chathamncgis.hub.arcgis.com
Updated Apr 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chatham County GIS Portal (2024). 15 3 2 [Dataset]. https://chatham-county-planning-subdivisions-and-rezonings-chathamncgis.hub.arcgis.com/datasets/15-3-2-3
Explore at:
Dataset updated
Apr 16, 2024
Dataset authored and provided by
Chatham County GIS Portal
Description
Attachment regarding a request by Strata Solar for a Conditional Use Permit on Parcel No. 12233, located of US 64 W, Hickory Mountain Township, for a solar farm on approximately 42 acres. The parcel is split between R-1 zoning and unzoned. The R-1 zoning is the portion subject to this CUP request which is approximately 23.3 acres.
Z
Data from: Long-term spatial memory, across large spatial scales, in...
data.niaid.nih.gov
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priscila A Moura; Fletcher J Young; Monica Monllor; Marcio Z Cardoso; Stephen H Montgomery (2023). Long-term spatial memory, across large spatial scales, in Heliconius butterflies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7985235
Explore at:
Dataset updated
May 30, 2023
Dataset provided by
Departamento de Ecologia, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
School of Biological Sciences, University of Bristol, Bristol, UK
Departamento de Ecologia, Universidade Federal do Rio Grande do Norte, Natal, RN, Brazil
Authors
Priscila A Moura; Fletcher J Young; Monica Monllor; Marcio Z Cardoso; Stephen H Montgomery
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data accompanying "Long-term spatial memory, across large spatial scales, in Heliconius butterflies", Current Biology 2023:

exp1.csv. Behavioural data from experiment 1.

exp2.csv. Behavioural data from experiment 2.

exp3.csv. Behavioural data from experiment 3.

Exp1&2.csv. Behavioural data comparing experiment 1 and 2.

Exp1byDay.csv. Behavioural data for experiment 1 split by day.

Exp2byDay.csv. Behavioural data for experiment 2 split by day.

Exp3byDay.csv. Behavioural data for experiment 3 split by day.

exp1.R. R code for experiment 1 analysis.

exp2.R. R code for experiment 2 analysis.

exp3.R. R code for experiment 3 analysis.

exp1vsExp2.R. R code for comparing experiment 1 and 2.
d
Data from: Water Temperature of Lakes in the Conterminous U.S. Using the...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8 Analysis Ready Dataset Raster Images from 2013-2023 [Dataset]. https://catalog.data.gov/dataset/water-temperature-of-lakes-in-the-conterminous-u-s-using-the-landsat-8-analysis-ready-2013
Explore at:
Dataset updated
Nov 13, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States, United States
Description
This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.
Data from: A split sex ratio in solitary and social nests of a facultatively...
zenodo.org
data.niaid.nih.gov
+2more
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam R. Smith; Karen M. Kapheim; Callum J. Kingwell; William T. Wcislo; Adam R. Smith; Karen M. Kapheim; Callum J. Kingwell; William T. Wcislo (2022). Data from: A split sex ratio in solitary and social nests of a facultatively social bee [Dataset]. http://doi.org/10.5061/dryad.62dt334
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.62dt334
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adam R. Smith; Karen M. Kapheim; Callum J. Kingwell; William T. Wcislo; Adam R. Smith; Karen M. Kapheim; Callum J. Kingwell; William T. Wcislo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A classic prediction of kin selection theory is that a mixed population of social and solitary nests of haplodiploid insects should exhibit a split sex ratio among offspring: female biased in social nests, male biased in solitary nests. Here we provide the first evidence of a solitary-social split sex ratio, using the sweat bee Megalopta genalis (Halictidae). Data from 2502 offspring collected from naturally occurring nests across six years spanning the range of the M. genalis reproductive season show that despite significant yearly and seasonal variation, the offspring sex ratio of social nests is consistently more female biased than in solitary nests. This suggests that split sex ratios may facilitate the evolutionary origins of cooperation based on reproductive altruism via kin selection.
d
Dataset and R code: Genetic diversity of lion populations in Kenya:...
search.dataone.org
datadryad.org
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mumbi Chege (2025). Dataset and R code: Genetic diversity of lion populations in Kenya: evaluating past management practices and recommendations for future conservation actions by Chege M et.al [Dataset]. http://doi.org/10.5061/dryad.s4mw6m9d8
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.s4mw6m9d8
Dataset updated
Jul 28, 2025
Dataset provided by
Dryad Digital Repository
Authors
Mumbi Chege
Description
The decline of lions (Panthera leo) in Kenya has raised conservation concerns on their overall population health and long-term survival. This study aimed to assess the genetic structure, differentiation, and diversity of lion populations in the country, while considering the influence of past management practices. Using a lion-specific Single Nucleotide Polymorphism (SNP) panel, we genotyped 171 individuals from 12 populations representative of areas with permanent lion presence. Our results revealed a distinct genetic pattern with pronounced population structure, confirmed a north-south split, and found no indication of inbreeding in any of the tested populations. Differentiation seems to be primarily driven by geographical barriers, human presence, and climatic factors, but management practices may have also affected the observed patterns. Notably, the Tsavo population displayed evidence of admixture, perhaps attributable to its geographic location as a suture zone, vast size, or to p..., This dataset was obtained from 12 kenyan lion populations. After DNA extraction, SNP genotyping was performed using an allele-specific KASP technique. The attached datasets includes the .txtÂ and .str versions of the autosomal SNPs to aid in reproducing the results.Â Â , , # dataset and r code associated with the publication entitled "Genetic diversity of lion populations in Kenya: evaluating past management practices and recommendations for future conservation actions" by Chege M et.al.

https://doi.org/10.5061/dryad.s4mw6m9d8

Â Â Â We provide the following description of the dataset and scripts for analysis carried out in R: We have split the data and scripts for ease of reference i.e.,

Â 1.) Script 1: titled â€˜***Calc_He_Ho_Ar_Fisâ€™***. For calculating the genetic diversity indices i.e. allelic richness (AR), Private alleles (AP), Inbreeding coefficients (FIS), expected (HE) and observed heterozygosity (HO). This script uses:

**â€œdata_HoHeAr.txtâ€ ** dataset. This dataset has information on individual samples, including their geographical area (population) of origin and the corresponding 335 autosomal single nucleotide polymorphism (SNP) reads.

â€˜***shompole2.txtâ€™*** Â this bears the dataset from the Shompol...
Machine learning pipeline to train toxicity prediction model of...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Ewald; Jan Ewald (2020). Machine learning pipeline to train toxicity prediction model of FunTox-Networks [Dataset]. http://doi.org/10.5281/zenodo.3529162
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3529162
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jan Ewald; Jan Ewald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Machine Learning pipeline used to provide toxicity prediction in FunTox-Networks

01_DATA # preprocessing and filtering of raw activity data from ChEMBL
- Chembl_v25 # latest activity assay data set from ChEMBL (retrieved Nov 2019)
- filt_stats.R # Filtering and preparation of raw data
- Filtered # output data sets from filt_stats.R
- toxicity_direction.csv # table of toxicity measurements and their proportionality to toxicity

02_MolDesc # Calculation of molecular descriptors for all compounds within the filtered ChEMBL data set
- datastore # files with all compounds and their calculated molecular descriptors based on SMILES
- scripts
- calc_molDesc.py # calculates for all compounds based on their smiles the molecular descriptors
- chemopy-1.1 # used python package for descriptor calculation as decsribed in: https://doi.org/10.1093/bioinformatics/btt105

03_Averages # Calculation of moving averages for levels and organisms as required for calculation of Z-scores
- datastore # output files with statistics calculated by make_Z.R
- scripts
-make_Z.R # script to calculate statistics to calculate Z-scores as used by the regression models

04_ZScores # Calculation of Z-scores and preparation of table to fit regression models
- datastore # Z-normalized activity data and molecular descriptors in the form as used for fitting regression models
- scripts
-calc_Ztable.py # based on activity data, molecular descriptors and Z-statistics, the learning data is calculated

05_Regression # Performing regression. Preparation of data by removing of outliers based on a linear regression model. Learning of random forest regression models. Validation of learning process by cross validation and tuning of hyperparameters.

- datastore # storage of all random forest regression models and average level of Z output value per level and organism (zexp_*.tsv)
- scripts
- data_preperation.R # set up of regression data set, removal of outliers and optional removal of fields and descriptors
- Rforest_CV.R # analysis of machine learning by cross validation, importance of regression variables and tuning of hyperparameters (number of trees, split of variables)
- Rforest.R # based on analysis of Rforest_CV.R learning of final models

rregrs_output
# early analysis of regression model performance with the package RRegrs as described in: https://doi.org/10.1186/s13321-015-0094-2
Black Jack - Interactive Card Game
kaggle.com
zip
Updated Dec 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick L Ford (2024). Black Jack - Interactive Card Game [Dataset]. https://www.kaggle.com/datasets/patricklford/black-jack-interactive-card-game/code
Explore at:
zip(4873009 bytes)Available download formats
Dataset updated
Dec 21, 2024
Authors
Patrick L Ford
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Introduction

Blackjack, also known as 21, is one of the most popular card games worldwide. Blackjack remains a favourite due to its mix of simplicity, luck, strategy, and fast paced game play, making it a staple in casinos.

Objective of Blackjack:

The goal of Blackjack is to have a hand value closer to 21 than the dealer's hand, without exceeding 21. If a player's hand exceeds 21, they "bust" and lose the round.

Card Values:

Number cards (2-10): These are worth their face value.

Face cards (Jack, Queen, King): Each is worth 10 points.

Ace: Can be worth either 1 or 11, depending on which value benefits the hand more without exceeding 21.

Setup:

Deck: Blackjack is typically played with one to eight standard decks of 52 cards.

Players: One or more players compete against the dealer. Each player is dealt a separate hand, and players do not compete against each other.

Table Layout: The table features spaces for player bets, cards, and chips.

Game Play:

Initial Bets:

Players place their bets in designated areas on the table.

Dealing Cards:

Each player and the dealer receive two cards.

Players' cards are dealt face-up, while the dealer gets one face-up card (up card) and one face-down card (hole card).

Player Options:

Hit: Request another card to add to their hand. Players can keep hitting until they are satisfied or bust.

Stand: Keep the current hand and end their turn.

Double Down: Double the initial bet and receive exactly one more card. Commonly allowed only on the first two cards.

Split: If the first two cards have the same rank, the player can split them into two separate hands by placing an additional bet equal to the original. Each hand is played separately.

Surrender (Optional Rule): Forfeit half the bet and end the turn. This is usually allowed only on the first two cards.

Insurance (Optional Rule): If the dealer's up card is an Ace, players may place a side bet (half the original bet) that the dealer has Blackjack. If the dealer has Blackjack, the insurance bet pays 2:1; otherwise, the player loses the insurance bet.

Dealer's Turn:

Hit until the hand value is 17 or higher.

Stand on 17 or higher (including "soft 17" in some variations).

The dealer does not have options; actions are automatic.

Winning:

Player Wins: The player's hand value is closer to 21 than the dealer's hand, or the dealer busts.

Dealer Wins: Dealer's hand value is closer to 21, or the player busts.

Push (Tie): Both hands have the same value; the player keeps their bet.

Blackjack (Natural):

If the player's initial two cards are an Ace and a 10-point card (Jack, Queen, King, or 10), they have a "Blackjack."

Blackjack typically pays 3:2 (e.g., a $10 bet wins $15).

If both the player and the dealer have Blackjack, it's a push.

House Edge and Strategy:

The casino typically has a small edge due to rules favouring the dealer (e.g., the player acts first, so they can bust before the dealer plays): - Basic strategy can minimise the house edge: - Strategy charts show the optimal play based on the player's hand and the dealer's up card. - Advanced players use card counting to track high value cards remaining in the deck, gaining an advantage.

Common Variations:

European Blackjack: Dealer receives only one card initially; no hole card until players complete their turns.

Spanish 21: Played with 48-card decks (no 10's), with bonuses for certain hands.

Pontoon: A British variation where "Five Card Trick" (five cards totalling 21 or less) is a winning hand.

Blackjack Switch: Players play two hands and can swap the second card between them.

Etiquette and Tips:

Use hand signals to indicate actions (e.g., tapping for "hit," waving for "stand").

Avoid touching chips after the deal starts.

Familiarise yourself with table-specific rules and variations.

Visualisation

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2Faa4b5d8819430e46c3203b3597666578%2FScreenshot%202024-12-21%2010.36.57.png?generation=1734781714095911&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F86038e4d98f429825106bb2e8b5f74e8%2FScreenshot%202024-12-21%2010.38.18.png?generation=1734781738030008&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F5b634959e2292840ce454745ca80062f%2FScreenshot%202024-12-21%2010.39.12.png?generation=1734781761032959&alt=media" alt="">

A Markdown document with the R code for the game of Black Jack. link

R Code

The provided R code implements a simplified version of the game Blackjack. It includes f...
Helpful Life Tips from Reddit Dataset (13K Tips)
kaggle.com
zip
Updated Oct 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2023). Helpful Life Tips from Reddit Dataset (13K Tips) [Dataset]. https://www.kaggle.com/datasets/asaniczka/helpful-life-tips-from-reddit-dataset-13k-tips
Explore at:
zip(4186644 bytes)Available download formats
Dataset updated
Oct 1, 2023
Authors
asaniczka
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Discover truly valuable life tips shared by real humans.

About the Dataset:

Reddit is a treasure trove of genuine life experiences from millions of people. Subreddits like r/lifeProTips and r/YouShouldKnow are well-known for containing some of the best and most practical tips that anyone can apply to their life.

This dataset is a cleaned version of the split reddit dump by u/Watchful1.

Each row in the dataset contains a helpful life tip.

Interesting Task Ideas:

Develop a web app that presents users with an interesting tip each day.

Explore the data to determine the most popular types of tips.

Build a recommendation system that suggests relevant tips based on specific life situations or topics.

Develop AI-powered models that generate useful life tips using the examples in the dataset.

Analyze popular life topics and their corresponding tips to uncover patterns and common themes.

If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

Checkout my other datasets

Gender Wage Gap in the USA

USA Hispanic-White Wage Gap Dataset

USA Unemployment Rates by Demographics & Race

USA Wage Comparison for College vs. High School

Employment-to-Population Ratio for USA
Publication data for "Symmetry breaking fluctuations split the porphyrin Q...
zenodo.org
application/gzip
Updated Oct 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Zuehlsdorff; Tim Zuehlsdorff (2024). Publication data for "Symmetry breaking fluctuations split the porphyrin Q bands" [Dataset]. http://doi.org/10.5281/zenodo.13975838
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13975838
Dataset updated
Oct 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tim Zuehlsdorff; Tim Zuehlsdorff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This directory contains all files necessary to reproduce the data presented in "Symmetry breaking fluctuations split the porphyrin Q bands" by Z. R Wiethorn, K. E. Hunter, A. Montoya-Castillo and T. J. Zuehlsdorff.

The folder "Frequency_analysis" contains input and output files for Gaussian ground-state frequency calculations of porphine, TPP, and TPPL.

The folder "MD_simulations" contains raw trajectory files, as well as all input data necessary to reproduce QM/MM simulations of porphine, TPP and TPPL in CS2 solvent. Calculations are run using an interface between the classical MD code AMBER and thequantum chemistry code TeraChem. AMBER .paramtop and restart files, as well as Terachem files for the QM region are provided. Additionally, example terachem files for computing vertical excitation energies for individual snapshots are provided.

The folder spectra_generation contains raw data of transition dipole and energy gap fluctuationsalong the MD trajectory, after the appropriate Eckart rotation and determination of the correct sign of the dipole moment. It also contains input files for the MolSpeckPy code, that can beused to generate spectra in the GCT and GNCT schemes. Finished spectra, energy gap and dipole spectral densities that are analysed in the main text are also provided. Additionally, we provide input and output files to compute optical spectra for TPPL and TPPL without its phenol rings in the FCHT scheme implemented in Gaussian.

Facebook

Twitter

Click to copy link

Link copied

Cite

Robert P. Sheridan (2023). Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. [Dataset]. http://doi.org/10.1021/ci400084k.s001

Data from: Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.1021/ci400084k.s001

Dataset updated

Jun 2, 2023

Dataset provided by

ACS Publications

Authors

Robert P. Sheridan

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.

Clear search

Close search

Google apps

Main menu

Data from: Time-Split Cross-Validation as a Method for Estimating the...

Replication data for: "Split Decisions: Household Finance When a Policy...

Data from: Mixed-strain housing for female C57BL/6, DBA/2, and BALB/c mice:...

Val split & vocab file

Dataset

Contents

Data from: Regression with Empirical Variable Selection: Description of a...

haoranxu_ALMA-13B-R-details

Data from: FFT-split-operator code for solving the Dirac equation in 2+1...

details_CohereLabs_c4ai-command-r-plus-08-2024_private

Full-length and split homologs of human proteins in the gut microbiome

Data from: Split-beam Echo Sounder and Navigation Data Collected Using a...

The Pearson correlation coefficients (r) of diversity measures based on...

Data from: 15 3 2

Data from: Long-term spatial memory, across large spatial scales, in...

Data from: Water Temperature of Lakes in the Conterminous U.S. Using the...

Data from: A split sex ratio in solitary and social nests of a facultatively...

Dataset and R code: Genetic diversity of lion populations in Kenya:...

Machine learning pipeline to train toxicity prediction model of...

Black Jack - Interactive Card Game

Introduction

Objective of Blackjack:

Card Values:

Setup:

Game Play:

Common Variations:

Etiquette and Tips:

Visualisation

R Code

Helpful Life Tips from Reddit Dataset (13K Tips)

About the Dataset:

Interesting Task Ideas:

Checkout my other datasets

Publication data for "Symmetry breaking fluctuations split the porphyrin Q...

Data from: Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.