14 datasets found
  1. f

    Data from: Time-Split Cross-Validation as a Method for Estimating the...

    • acs.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert P. Sheridan (2023). Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. [Dataset]. http://doi.org/10.1021/ci400084k.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    ACS Publications
    Authors
    Robert P. Sheridan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.

  2. R

    Replication data for: "Split Decisions: Household Finance When a Policy...

    • dataverse.iza.org
    • dataverse.harvard.edu
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael A. Clemens; Michael A. Clemens; Erwin R. Tiongson; Erwin R. Tiongson (2024). Replication data for: "Split Decisions: Household Finance When a Policy Discontinuity Allocates Overseas Work" [Dataset]. http://doi.org/10.7910/DVN/2DO8QP
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Research Data Center of IZA (IDSC)
    Authors
    Michael A. Clemens; Michael A. Clemens; Erwin R. Tiongson; Erwin R. Tiongson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Clemens, Michael A., and Tiongson, Erwin R., (2017) "Split Decisions: Household Finance When a Policy Discontinuity Allocates Overseas Work." Review of Economics and Statistics 99:3, 531-543.

  3. Data from: Regression with Empirical Variable Selection: Description of a...

    • plos.figshare.com
    txt
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anne E. Goodenough; Adam G. Hart; Richard Stafford (2023). Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets [Dataset]. http://doi.org/10.1371/journal.pone.0034338
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Anne E. Goodenough; Adam G. Hart; Richard Stafford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.

  4. d

    Data from: Mixed-strain housing for female C57BL/6, DBA/2, and BALB/c mice:...

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mason, Georgia; Walker, Michael (2023). Mixed-strain housing for female C57BL/6, DBA/2, and BALB/c mice: Validating a split-plot design that promotes refinement and reduction [Dataset]. https://search.dataone.org/view/sha256%3A2b1ace7be31b90c0a2cf6859c8ec9dc108595d64d1ead30a0bfe0477100a52a8
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Mason, Georgia; Walker, Michael
    Time period covered
    May 1, 2013 - Aug 1, 2013
    Description

    Validating a novel housing method for inbred mice: mixed-strain housing. To see if this housing method affected strain-typical mouse phenotypes, if variance in the data was affected, and how statistical power was increased through this split-plot design.

  5. Val split & vocab file

    • kaggle.com
    zip
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Devi Hemamalini R (2024). Val split & vocab file [Dataset]. https://www.kaggle.com/datasets/devihemamalinir/val-split-and-vocab-file
    Explore at:
    zip(1603266139 bytes)Available download formats
    Dataset updated
    Jul 6, 2024
    Authors
    Devi Hemamalini R
    Description

    Dataset

    This dataset was created by Devi Hemamalini R

    Contents

  6. h

    programming-languages-keywords

    • huggingface.co
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2023). programming-languages-keywords [Dataset]. https://huggingface.co/datasets/bigcode/programming-languages-keywords
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2023
    Dataset authored and provided by
    BigCode
    Description

    Dataset Card for "programming-languages-keywords"

    Structured version of https://github.com/e3b0c442/keywords Generated using: r = requests.get("https://raw.githubusercontent.com/e3b0c442/keywords/main/README.md") keywords = r.text.split("### ")[1:] keywords = [i for i in keywords if not i.startswith("Sources")] keywords = {i.split(" ")[0]:[j for j in re.findall("[a-zA-Z]*", i.split(" ",1)[1]) if j] for i in keywords} keywords =… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/programming-languages-keywords.

  7. Nonsense mutations and split genes in R. peacockii relative to R. rickettsii...

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roderick F. Felsheim; Timothy J. Kurtti; Ulrike G. Munderloh (2023). Nonsense mutations and split genes in R. peacockii relative to R. rickettsii Sheila Smith. [Dataset]. http://doi.org/10.1371/journal.pone.0008361.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Roderick F. Felsheim; Timothy J. Kurtti; Ulrike G. Munderloh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nonsense mutations and split genes in R. peacockii relative to R. rickettsii Sheila Smith.

  8. d

    Data from: FFT-split-operator code for solving the Dirac equation in 2+1...

    • elsevier.digitalcommonsdata.com
    Updated Jun 1, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guido R. Mocken (2008). FFT-split-operator code for solving the Dirac equation in 2+1 dimensions [Dataset]. http://doi.org/10.17632/43v3vvkwwf.1
    Explore at:
    Dataset updated
    Jun 1, 2008
    Authors
    Guido R. Mocken
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Abstract The main part of the code presented in this work represents an implementation of the split-operator method [J.A. Fleck, J.R. Morris, M.D. Feit, Appl. Phys. 10 (1976) 129-160; R. Heather, Comput. Phys. Comm. 63 (1991) 446] for calculating the time-evolution of Dirac wave functions. It allows to study the dynamics of electronic Dirac wave packets under the influence of any number of laser pulses and its interaction with any number of charged ion potentials. The initial wave function can be eith...

    Title of program: Dirac++ or (abbreviated) d++ Catalogue Id: AEAS_v1_0

    Nature of problem The relativistic time evolution of wave functions according to the Dirac equation is a challenging numerical task. Especially for an electron in the presence of high intensity laser beams and/or highly charged ions, this type of problem is of considerable interest to atomic physicists.

    Versions of this program held in the CPC repository in Mendeley Data AEAS_v1_0; Dirac++ or (abbreviated) d++; 10.1016/j.cpc.2008.01.042

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  9. f

    The Pearson correlation coefficients (r) of diversity measures based on...

    • datasetcatalog.nlm.nih.gov
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhari, Niloufar; Tupper, Paul; Mooers, Arne; Colijn, Caroline (2024). The Pearson correlation coefficients (r) of diversity measures based on heterozygosity and split system diversity applied on subsets of Atlantic salmon populations with size k = 2, 3, and 4. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001353332
    Explore at:
    Dataset updated
    Dec 4, 2024
    Authors
    Abhari, Niloufar; Tupper, Paul; Mooers, Arne; Colijn, Caroline
    Description

    The Pearson correlation coefficients (r) of diversity measures based on heterozygosity and split system diversity applied on subsets of Atlantic salmon populations with size k = 2, 3, and 4.

  10. a

    Data from: 15 5 4

    • chatham-county-planning-subdivisions-and-rezonings-chathamncgis.hub.arcgis.com
    Updated Apr 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chatham County GIS Portal (2024). 15 5 4 [Dataset]. https://chatham-county-planning-subdivisions-and-rezonings-chathamncgis.hub.arcgis.com/datasets/15-5-4-1
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset authored and provided by
    Chatham County GIS Portal
    Description

    Attachment regarding a request by Strata Solar for a Conditional Use Permit on Parcel No. 12233, located of US 64 W, Hickory Mountain Township, for a solar farm on approximately 42 acres. The parcel is split between R-1 zoning and unzoned. The R-1 zoning is the portion subject to this CUP request which is approximately 23.3 acres.

  11. w

    Global Mini Split Market Research Report: By Type (Wall-Mounted Mini Split...

    • wiseguyreports.com
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Mini Split Market Research Report: By Type (Wall-Mounted Mini Split Systems, Ceiling Cassette Mini Split Systems, Floor-Mounted Mini Split Systems, Multi-Split Mini Split Systems), By Refrigerant Type (R-410A, R-32, R-22), By Installation Type (Single Zone, Multi Zone), By Application (Residential, Commercial, Industrial) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/cn/reports/mini-split-market
    Explore at:
    Dataset updated
    Jan 3, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20247.18(USD Billion)
    MARKET SIZE 20257.55(USD Billion)
    MARKET SIZE 203512.4(USD Billion)
    SEGMENTS COVEREDType, Refrigerant Type, Installation Type, Application, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSincreasing energy efficiency demand, rising construction activities, technological advancements in HVAC, growing awareness of indoor air quality, shift towards sustainable cooling solutions
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDTrane, Fujitsu, Panasonic, Midea, Johnson Controls, Gree Electric Appliances, Haier, Daikin, Samsung Electronics, Carrier, LG Electronics, American Standard, Rheem, Toshiba, Mitsubishi Electric
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESRising demand for energy efficiency, Growth in urbanization and housing, Adoption in commercial spaces, Increased focus on indoor air quality, Technological advancements in smart systems
    COMPOUND ANNUAL GROWTH RATE (CAGR) 5.1% (2025 - 2035)
  12. f

    Divide and Recombine Approaches for Fitting Smoothing Spline Models with...

    • tandf.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danqing Xu; Yuedong Wang (2023). Divide and Recombine Approaches for Fitting Smoothing Spline Models with Large Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.5635045.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Danqing Xu; Yuedong Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Spline smoothing is a widely used nonparametric method that allows data to speak for themselves. Due to its complexity and flexibility, fitting smoothing spline models is usually computationally intensive which may become prohibitive with large datasets. To overcome memory and CPU limitations, we propose four divide and recombine (D&R) approaches for fitting cubic splines with large datasets. We consider two approaches to divide the data: random and sequential. For each approach of division, we consider two approaches to recombine. These D&R approaches are implemented in parallel without communication. Extensive simulations show that these D&R approaches are scalable and have comparable performance as the method that uses the whole data. The sequential D&R approaches are spatially adaptive which lead to better performance than the method that uses the whole data when the underlying function is spatially inhomogeneous.

  13. Total IES-R scores split for psychological impact level- PIL.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giovanni Fiorilli; Elisa Grazioli; Andrea Buonsenso; Giulia Di Martino; Tsopani Despina; Giuseppe Calcagno; Alessandra di Cagno (2023). Total IES-R scores split for psychological impact level- PIL. [Dataset]. http://doi.org/10.1371/journal.pone.0248345.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Giovanni Fiorilli; Elisa Grazioli; Andrea Buonsenso; Giulia Di Martino; Tsopani Despina; Giuseppe Calcagno; Alessandra di Cagno
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Total IES-R scores split for psychological impact level- PIL.

  14. f

    Vegetation variables in the case study dataset.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anne E. Goodenough; Adam G. Hart; Richard Stafford (2023). Vegetation variables in the case study dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0034338.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Anne E. Goodenough; Adam G. Hart; Richard Stafford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vegetation variables in the case study dataset.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Robert P. Sheridan (2023). Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. [Dataset]. http://doi.org/10.1021/ci400084k.s001

Data from: Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
ACS Publications
Authors
Robert P. Sheridan
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R2) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.

Search
Clear search
Close search
Google apps
Main menu