44 datasets found
  1. c

    Standardization in Quantitative Imaging: A Multi-center Comparison of...

    • cancerimagingarchive.net
    • stage.cancerimagingarchive.net
    • +1more
    n/a, nifti and zip +1
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, Standardization in Quantitative Imaging: A Multi-center Comparison of Radiomic Feature Values [Dataset]. http://doi.org/10.7937/tcia.2020.9era-gg29
    Explore at:
    xlsx, n/a, nifti and zipAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Jun 9, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This dataset was used by the NCI's Quantitative Imaging Network (QIN) PET-CT Subgroup for their project titled: Multi-center Comparison of Radiomic Features from Different Software Packages on Digital Reference Objects and Patient Datasets. The purpose of this project was to assess the agreement among radiomic features when computed by several groups by using different software packages under very tightly controlled conditions, which included common image data sets and standardized feature definitions. The image datasets (and Volumes of Interest – VOIs) provided here are the same ones used in that project and reported in the publication listed below (ISSN 2379-1381 https://doi.org/10.18383/j.tom.2019.00031). In addition, we have provided detailed information about the software packages used (Table 1 in that publication) as well as the individual feature value results for each image dataset and each software package that was used to create the summary tables (Tables 2, 3 and 4) in that publication. For that project, nine common quantitative imaging features were selected for comparison including features that describe morphology, intensity, shape, and texture and that are described in detail in the International Biomarker Standardisation Initiative (IBSI, https://arxiv.org/abs/1612.07003 and publication (Zwanenburg A. Vallières M, et al, The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020 May;295(2):328-338. doi: https://doi.org/10.1148/radiol.2020191145). There are three datasets provided – two image datasets and one dataset consisting of four excel spreadsheets containing feature values.

    1. The first image dataset is a set of three Digital Reference Objects (DROs) used in the project, which are: (a) a sphere with uniform intensity, (b) a sphere with intensity variation (c) a nonspherical (but mathematically defined) object with uniform intensity. These DROs were created by the team at Stanford University and are described in (Jaggi A, Mattonen SA, McNitt-Gray M, Napel S. Stanford DRO Toolkit: digital reference objects for standardization of radiomic features. Tomography. 2019;6:–.) and are a subset of the DROs described in DRO Toolkit. Each DRO is represented in both DICOM and NIfTI format and the VOI was provided in each format as well (DICOM Segmentation Object (DSO) as well as NIfTI segmentation boundary).
    2. The second image dataset is the set of 10 patient CT scans, originating from the LIDC-IDRI dataset, that were used in the QIN multi-site collection of Lung CT data with Nodule Segmentations project ( https://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7 ). In that QIN study, a single lesion from each case was identified for analysis and then nine VOIs were generated using three repeat runs of three segmentation algorithms (one from each of three academic institutions) on each lesion. To eliminate one source of variability in our project, only one of the VOIs previously created for each lesion was identified and all sites used that same VOI definition. The specific VOI chosen for each lesion was the first run of the first algorithm (algorithm 1, run 1). DICOM images were provided for each dataset and the VOI was provided in both DICOM Segmentation Object (DSO) and NIfTI segmentation formats.
    3. The third dataset is a collection of four excel spreadsheets, each of which contains detailed information corresponding to each of the four tables in the publication. For example, the raw feature values and the summary tables for Tables 2,3 and 4 reported in the publication cited (https://doi.org/10.18383/j.tom.2019.00031). These tables are:
    Software Package details : This table contains detailed information about the software packages used in the study (and listed in Table 1 in the publication) including version number and any parameters specified in the calculation of the features reported. DRO results : This contains the original feature values obtained for each software package for each DRO as well as the table summarizing results across software packages (Table 2 in the publication) . Patient Dataset results: This contains the original feature values for each software package for each patient dataset (1 lesion per case) as well as the table summarizing results across software packages and patient datasets (Table 3 in the publication). Harmonized GLCM Entropy Results : This contains the values for the “Harmonized” GLCM Entropy feature for each patient dataset and each software package as well as the summary across software packages (Table 4 in the publication).

  2. Replication Package - How Do Requirements Evolve During Elicitation? An...

    • zenodo.org
    bin, zip
    Updated Apr 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath (2022). Replication Package - How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis [Dataset]. http://doi.org/10.5281/zenodo.6472498
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Apr 21, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alessio Ferrari; Alessio Ferrari; Paola Spoletini; Paola Spoletini; Sourav Debnath; Sourav Debnath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the replication package for the paper titled "How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis", by Alessio Ferrari, Paola Spoletini and Sourav Debnath.

    The package contains the following folders and files.

    /R-analysis

    This is a folder containing all the R implementations of the the statistical tests included in the paper, together with the source .csv file used to produce the results. Each R file has the same title as the associated .csv file. The titles of the files reflect the RQs as they appear in the paper. The association between R files and Tables in the paper is as follows:

    - RQ1-1-analyse-story-rates.R: Tabe 1, user story rates

    - RQ1-1-analyse-role-rates.R: Table 1, role rates

    - RQ1-2-analyse-story-category-phase-1.R: Table 3, user story category rates in phase 1 compared to original rates

    - RQ1-2-analyse-role-category-phase-1.R: Table 5, role category rates in phase 1 compared to original rates

    - RQ2.1-analysis-app-store-rates-phase-2.R: Table 8, user story and role rates in phase 2

    - RQ2.2-analysis-percent-three-CAT-groups-ph1-ph2.R: Table 9, comparison of the categories of user stories in phase 1 and 2

    - RQ2.2-analysis-percent-two-CAT-roles-ph1-ph2.R: Table 10, comparison of the categories of roles in phase 1 and 2.

    The .csv files used for statistical tests are also used to produce boxplots. The association betwee boxplot figures and files is as follows.

    - RQ1-1-story-rates.csv: Figure 4

    - RQ1-1-role-rates.csv: Figure 5

    - RQ1-2-categories-phase-1.csv: Figure 8

    - RQ1-2-role-category-phase-1.csv: Figure 9

    - RQ2-1-user-story-and-roles-phase-2.csv: Figure 13

    - RQ2.2-percent-three-CAT-groups-ph1-ph2.csv: Figure 14

    - RQ2.2-percent-two-CAT-roles-ph1-ph2.csv: Figure 17

    - IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv: Figure 15

    - IMG-only-RQ2.2-frequent-roles.csv: Figure 18

    NOTE: The last two .csv files do not have an associated statistical tests, but are used solely to produce boxplots.

    /Data-Analysis

    This folder contains all the data used to answer the research questions.

    RQ1.xlsx: includes all the data associated to RQ1 subquestions, two tabs for each subquestion (one for user stories and one for roles). The names of the tabs are self-explanatory of their content.

    RQ2.1.xlsx: includes all the data for the RQ1.1 subquestion. Specifically, it includes the following tabs:

    - Data Source-US-category: for each category of user story, and for each analyst, there are two lines.

    The first one reports the number of user stories in that category for phase 1, and the second one reports the

    number of user stories in that category for phase 2, considering the specific analyst.

    - Data Source-role: for each category of role, and for each analyst, there are two lines.

    The first one reports the number of user stories in that role for phase 1, and the second one reports the

    number of user stories in that role for phase 2, considering the specific analyst.

    - RQ2.1 rates: reports the final rates for RQ2.1.

    NOTE: The other tabs are used to support the computation of the final rates.

    RQ2.2.xlsx: includes all the data for the RQ2.2 subquestion. Specifically, it includes the following tabs:

    - Data Source-US-category: same as RQ2.1.xlsx

    - Data Source-role: same as RQ2.1.xlsx

    - RQ2.2-category-group: comparison between groups of categories in the different phases, used to produce Figure 14

    - RQ2.2-role-group: comparison between role groups in the different phases, used to produce Figure 17

    - RQ2.2-specific-roles-diff: difference between specific roles, used to produce Figure 18

    NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

    RQ2.2-single-US-category.xlsx: includes the data for the RQ2.2 subquestion associated to single categories of user stories.

    A separate tab is used given the complexity of the computations.

    - Data Source-US-category: same as RQ2.1.xlsx

    - Totals: total number of user stories for each analyst in phase 1 and phase 2

    - Results-Rate-Comparison: difference between rates of user stories in phase 1 and phase 2, used to produce the file

    "img/IMG-only-RQ2.2-us-category-comparison-ph1-ph2.csv", which is in turn used to produce Figure 15

    - Results-Analysts: number of analysts using each novel category produced in phase 2, used to produce Figure 16.

    NOTE: the other tabs are used to support the computation of the values reported in the tabs above.

    RQ2.3.xlsx: includes the data for the RQ2.3 subquestion. Specifically, it includes the following tabs:

    - Data Source-US-category: same as RQ2.1.xlsx

    - Data Source-role: same as RQ2.1.xlsx

    - RQ2.3-categories: novel categories produced in phase 2, used to produce Figure 19

    - RQ2-3-most-frequent-categories: most frequent novel categories

    /Raw-Data-Phase-I

    The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx, plus the file of the original user stories with annotations (original-us.xlsx). Each file contains two tabs:

    - Evaluation: includes the annotation of the user stories as existing user story in the original categories (annotated with "E"), novel user story in a certain category (refinement, annotated with "N"), and novel user story in novel category (Name of the category in column "New Feature"). **NOTE 1:** It should be noticed that in the paper the case "refinement" is said to be annotated with "R" (instead of "N", as in the files) to make the paper clearer and easy to read.

    - Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

    /Raw-Data-Phaes-II

    The folder contains one Excel file for each analyst, s1.xlsx...s30.xlsx. Each file contains two tabs:

    - Analysis: includes the annotation of the user stories as belonging to existing original

    category (X), or to categories introduced after interviews, or to categories introduced

    after app store inspired elicitation (name of category in "Cat. Created in PH1"), or to

    entirely novel categories (name of category in "New Category").

    - Roles: roles used in the user stories, and count of the user stories belonging to a certain role.

    /Figures

    This folder includes the figures reported in the paper. The boxplots are generated from the

    data using the tool http://shiny.chemgrid.org/boxplotr/. The histograms and other plots are

    produced with Excel, and are also reported in the excel files listed above.

  3. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  4. f

    IDPQuantify: Combining Precursor Intensity with Spectral Counts for Protein...

    • acs.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yao-Yi Chen; Matthew C. Chambers; Ming Li; Amy-Joan L. Ham; Jeffrey L. Turner; Bing Zhang; David L. Tabb (2023). IDPQuantify: Combining Precursor Intensity with Spectral Counts for Protein and Peptide Quantification [Dataset]. http://doi.org/10.1021/pr400438q.s002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Yao-Yi Chen; Matthew C. Chambers; Ming Li; Amy-Joan L. Ham; Jeffrey L. Turner; Bing Zhang; David L. Tabb
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Differentiating and quantifying protein differences in complex samples produces significant challenges in sensitivity and specificity. Label-free quantification can draw from two different information sources: precursor intensities and spectral counts. Intensities are accurate for calculating protein relative abundance, but values are often missing due to peptides that are identified sporadically. Spectral counting can reliably reproduce difference lists, but differentiating peptides or quantifying all but the most concentrated protein changes is usually beyond its abilities. Here we developed new software, IDPQuantify, to align multiple replicates using principal component analysis, extract accurate precursor intensities from MS data, and combine intensities with spectral counts for significant gains in differentiation and quantification. We have applied IDPQuantify to three comparative proteomic data sets featuring gold standard protein differences spiked in complicated backgrounds. The software is able to associate peptides with peaks that are otherwise left unidentified to increase the efficiency of protein quantification, especially for low-abundance proteins. By combing intensities with spectral counts from IDPicker, it gains an average of 30% more true positive differences among top differential proteins. IDPQuantify quantifies protein relative abundance accurately in these test data sets to produce good correlations between known and measured concentrations.

  5. c

    Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 - 7—Data

    • s.cnmilf.com
    • data.usgs.gov
    • +3more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Variable Terrestrial GPS Telemetry Detection Rates: Parts 1 - 7—Data [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/variable-terrestrial-gps-telemetry-detection-rates-parts-1-7data
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Studies utilizing Global Positioning System (GPS) telemetry rarely result in 100% fix success rates (FSR). Many assessments of wildlife resource use do not account for missing data, either assuming data loss is random or because a lack of practical treatment for systematic data loss. Several studies have explored how the environment, technological features, and animal behavior influence rates of missing data in GPS telemetry, but previous spatially explicit models developed to correct for sampling bias have been specified to small study areas, on a small range of data loss, or to be species-specific, limiting their general utility. Here we explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use. We also evaluate patterns in missing data that relate to potential animal activities that change the orientation of the antennae and characterize home-range probability of GPS detection for 4 focal species; cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Part 1, Positive Openness Raster (raster dataset): Openness is an angular measure of the relationship between surface relief and horizontal distance. For angles less than 90 degrees it is equivalent to the internal angle of a cone with its apex at a DEM _location, and is constrained by neighboring elevations within a specified radial distance. 480 meter search radius was used for this calculation of positive openness. Openness incorporates the terrain line-of-sight or viewshed concept and is calculated from multiple zenith and nadir angles-here along eight azimuths. Positive openness measures openness above the surface, with high values for convex forms and low values for concave forms (Yokoyama et al. 2002). We calculated positive openness using a custom python script, following the methods of Yokoyama et. al (2002) using a USGS National Elevation Dataset as input. Part 2, Northern Arizona GPS Test Collar (csv): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. The model training data are provided here for fix attempts by hour. This table can be linked with the site _location shapefile using the site field. Part 3, Probability Raster (raster dataset): Bias correction in GPS telemetry datasets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix aquistion. We found terrain exposure and tall overstory vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The models predictive ability was evaluated using two independent datasets from stationary test collars of different make/model, fix interval programing, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. We evaluated GPS telemetry datasets by comparing the mean probability of a successful GPS fix across study animals home-ranges, to the actual observed FSR of GPS downloaded deployed collars on cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Comparing the mean probability of acquisition within study animals home-ranges and observed FSRs of GPS downloaded collars resulted in a approximatly 1:1 linear relationship with an r-sq= 0.68. Part 4, GPS Test Collar Sites (shapefile): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. Part 5, Cougar Home Ranges (shapefile): Cougar home-ranges were calculated to compare the mean probability of a GPS fix acquisition across the home-range to the actual fix success rate (FSR) of the collar as a means for evaluating if characteristics of an animal’s home-range have an effect on observed FSR. We estimated home-ranges using the Local Convex Hull (LoCoH) method using the 90th isopleth. Data obtained from GPS download of retrieved units were only used. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose as additional 10% of data. Comparisons with home-range mean probability of fix were also used as a reference for assessing if the frequency animals use areas of low GPS acquisition rates may play a role in observed FSRs. Part 6, Cougar Fix Success Rate by Hour (csv): Cougar GPS collar fix success varied by hour-of-day suggesting circadian rhythms with bouts of rest during daylight hours may change the orientation of the GPS receiver affecting the ability to acquire fixes. Raw data of overall fix success rates (FSR) and FSR by hour were used to predict relative reductions in FSR. Data only includes direct GPS download datasets. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose approximately an additional 10% of data. Part 7, Openness Python Script version 2.0: This python script was used to calculate positive openness using a 30 meter digital elevation model for a large geographic area in Arizona, California, Nevada and Utah. A scientific research project used the script to explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use.

  6. C

    Hospital Annual Financial Data - Selected Data & Pivot Tables

    • data.chhs.ca.gov
    • data.ca.gov
    • +5more
    csv, data, doc, html +4
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
    Explore at:
    xlsx, xlsx(770931), xls(44967936), data, xls, html, xls(51554816), xlsx(752914), xls(16002048), xlsx(765216), xls(44933632), xls(14657536), xlsx(750199), xlsx(756356), pdf(303198), pdf(333268), xls(51424256), xls(19650048), xls(18445312), pdf(383996), pdf(121968), xlsx(768036), zip, xlsx(779866), xls(19625472), xlsx(771275), xlsx(758376), xls(19599360), doc, xls(19577856), pdf(310420), xlsx(758089), xls(18301440), xlsx(754073), xlsx(763636), xlsx(14714368), xlsx(769128), xls(920576), csv(205488092), pdf(258239), xlsx(777616), xlsx(782546), xlsx(790979)Available download formats
    Dataset updated
    Apr 23, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

    Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

    There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.

  7. m

    Dataset for Ranking of Renewable Energy Sources Using Delphi-MGDM Framework

    • data.mendeley.com
    Updated Feb 26, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dave Pojadas (2020). Dataset for Ranking of Renewable Energy Sources Using Delphi-MGDM Framework [Dataset]. http://doi.org/10.17632/nmkwzz42k5.4
    Explore at:
    Dataset updated
    Feb 26, 2020
    Authors
    Dave Pojadas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data sets are part of the study titled "A web-based Delphi multi-criteria group decision-making framework for renewable energy project development processes." The study aims to outline and implement the web-based Delphi Multi-criteria Group Decision Making (Delphi-MGDM) Framework, which is intended to facilitate top-level group decision-making for renewable energy project development and long-term strategic direction setting. The datasets include: (1) the weights of criteria obtained from judgments of the experts, (2) the summary of criteria scores, (3) the comparison table dataset, and (4) the full report of the Visual PROMETHEE. “Criteria Weighing Dataset” is obtained from the judgment of experts using the AHP-Online System created by Klaus D. Goepel (available at https://bpmsg.com/ahp/ahp.php). On a pairwise comparison basis, we asked the experts to make their opinion on four (4) criteria and then the sixteen (16) sub-criteria in three rounds. The group weights after the third round are considered the final weights of criteria and sub-criteria. To rank RES using MCDA, we used the data from the literature and the Philippines’ DOE for all ten quantitative sub-criteria. However, there are six qualitative sub-criteria, so we asked the opinion of experts on how solar, wind, biomass, and hydro-power are performing in each criterion based on their knowledge and expertise. This time, we used a self-derived questionnaire and as a summary of this process, we produced the “Scoring of Options Dataset.” We got the average, minimum and maximum values of the scores to make data for the ranking in three cases (realistic, pessimistic, and optimistic). "Comparison table" dataset is composed of comparison tables for the three cases. Table A reflects the data for realistic case in which we use the averages of the qualitative inputs from experts, the averages of quantitative data obtained in ranges, and the actual value of data not given in ranges. Table B reflects the data for the optimistic case. For qualitative data, we used the minimum value of the sub-criteria to be minimized and maximum value for sub-criteria to maximized. For quantitative data in ranges, we used the minimum value of cost sub-criteria and maximum value of benefit sub-criteria. We estimated fictitious data for some quantitative data not given in ranges. Table C reflects the data for the pessimistic case. We used the same concept with Table B, but with opposite choices. For instance, we used the maximum value of cost sub-criteria and minimum value of benefit sub-criteria for quantitative data. Finally, we used Visual PROMETHEE (available at http://www.promethee-gaia.net/vpa.html) to rank renewable energy sources. The "Visual PROMETHEE Full Report" dataset is the actual report exported from the Visual PROMETHEE application – containing a partial and complete ranking of RES.

  8. f

    Three variants of synthetic benchmarks time series of GPS and ERA-Interim...

    • figshare.com
    zip
    Updated Jan 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Kłos; Eric Pottiaux; Roeland Van Malderen (2020). Three variants of synthetic benchmarks time series of GPS and ERA-Interim IWV differences [Dataset]. http://doi.org/10.6084/m9.figshare.11733615.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 28, 2020
    Dataset provided by
    figshare
    Authors
    Anna Kłos; Eric Pottiaux; Roeland Van Malderen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of the synthetic datasetsDaily synthetic series of 6000 samples (i.e. length of 16y*365d) for 120 IGS sites for GPS-retrieved (IGS repro1) Integrated Water Vapour (IWV) values, and IWV differences between ERA-Interim model and GPS (IGS repro1) were simulated based on the characterisation (signal and noises) derived from the real datasets. The real ERAI-GPS IWV differences were firstly homogenized with a manual detection of breaks to provide the most consistent series. All manually detected epochs of breaks were cross-validated with information included in the log-files of the stations. If manually detected breaks were not reported as change in a log-file, then they were not corrected for, unless the offset is clearly seen in differences (ERAI-GPS). We assume the ERA-Interim model as an absolute reference with no artificial breaks. Under this assumption, only climate signals should be responsible for jumps in the time series.We tested different approaches, as we generated synthetic datasets for the IGS repro1, the ERA-Interim, and their differences. We found that generating directly the synthetic differences is closer to the real differences than building the differences afterwards based on generated synthetic ERA-Interim and the generated synthetic IGS repro1 IWV time series separately, so we proceeded with the synthetic differences. Specifications of the synthetic datasets availableThree different variants of those synthetic datasets were constructed:Variant 1: The ‘Easy’ dataset: it includes seasonal signals (annual, semi-annual, 3 & 4 months if present for a particular station) + offsets + white noise.Variant 2: The ‘Moderate’ dataset: seasonal signals (annual, semi-annual, 3 & 4 months) + offsets + autoregressive process of the first order + white noise (AR(1)+WH).Variant 3: The ‘Complex’ dataset: trend + seasonal signals (annual, semi-annual, 3 & 4 months) + offsets + AR(1)+WH + gaps.Variant 1 was created only for the ERA-Interim - GPS differences while Variants 2 and 3 were created both for 1) differences of IWV between ERAI and GPS (ERAI-GPS), and 2) GPS itself. The values of trends, amplitudes of seasonal signals, noise process and percentage of gaps were directly modelled taking into account the derived characteristics from the real datasets. The epochs of offsets were simulated randomly, separately for each variant, but the number and amplitudes of the offsets are characteristic for the real datasets. File format of the synthetic datasets Each of the simulated series are stored in a separated file. As for the real dataset, each file includes three columns: “year, y-x, x”. “Year” is a date formatted as YYMMDD.HHMMSS (e.g. 950101.120000 for 1st January 1995 at 12:00 UTC), column “y-x” includes differences between the ERAI and GPS synthetic values (in that order), and “x” means the ‘GPS-retrieved’ IWV synthetic values.

  9. Data associated with manuscript

    • figshare.com
    txt
    Updated Sep 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elias Bloom; Javier Gutiérrez Illán; Matthew Brousil; John P. Reganold; Tobin D. Northfield; David W. Crowder (2023). Data associated with manuscript [Dataset]. http://doi.org/10.6084/m9.figshare.24057261.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 3, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Elias Bloom; Javier Gutiérrez Illán; Matthew Brousil; John P. Reganold; Tobin D. Northfield; David W. Crowder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data associated with main analysis and figures presented in manuscript titled: "Long-term organic farming and floral diversity promotes stability of bee communities in agroecosystems." Dataset titled: "Bloom_BetaDiverstiy_GPSPointsAndSiteCharacteristics_DataFinal.csv" are associated with the map of site locations shown in Figure 1. These data are also given in Table 2 (note no data are associated with Table 1). GPS points have been jittered to protect the identity of farmer collaborators. Exact locations are available upon request and after consideration by the lead author. No data are associated with Figure 2. The four datasets tilted: "Bloom_BetaDiversity_GeometricRemovalXXX_DataFinal.csv" are files associated with the geometric species removal analysis accompanying Figure 3, where XXX is the scale (local or landscape) and term (species loss or removal) (see manuscript for details). Column definitions include: SiteID - the site for which the statistic was generated (local level only); SiteID1 - the first site involved in the pairwise comparison which generated the statistic (landscape level only); SiteID2 - the second site involved in the pairwise comparison which generated the statistic (landscape level only); year1 - the year (e.g., 20XX) for which the sample was taken at the site given in SiteID1 which then generated the statistic (landscape level only); year2 - the year (e.g., 20XX) for which the sample was taken at the site given in SiteID2 which then generated the statistic (landscape level only); variable - species removed at random; sim (replacement) or sne (loss) - the statistic with no species removed; value - the statistics with the species removed; vec - the color relating the number of species removed (see Figure 3 caption for column species number relationships). The dataset title: "Bloom_BetaDiverstiy_SADS_DataFinal.xlsx" contains three sheets each associated with a sample year (2014, 2015, 2016). Within each sheet are the vectors of abundance values and species names used to create species abundance models plotted in Figure 4. The datasets titled: "Bloom_BetaDiverstiy_LocalLevelRegressions_DataFinal.csv" and "Bloom_BetaDiverstiy_LandscapeLevelRegressions_DataFinal" contain statistics used to create Figure 5. The column name definitions that have not been previously given are as follows: detlaaic - difference in species abundance model fit (for local level); Years.Since.Transition - years since transitioning to organic farming scaled to enhance regression model fitting; sor - the overall Sorenson's beta diversity term for bees across years or sites (depends on scale see file name); sim - the species replacement term from the additive partition of Sorenson's beta diversity for bees across years or sites (depends on scale see file name); nes - the species loss term from the additive partition of Sorenson's beta diversity for bees across years or sites (depends on scale see file name); X.sor/X.sim/S.nes - where X is l and p for landscape and plant beta diversity for each term at the site across years or sites (depends on scale see file name); diff_years - differences in times since transitions to organic agriculture scaled to enhance regression model fitting (landscape level only); diff_do_new - difference in species abundance model fit (for landscape level); p.all - multiplication of the p.sim and l.nes terms for plotting the interaction shown in Figure 5d. The dataset titled: "Bloom_BetaDiverstiy_Jackknife_DataFinal.xlsx" contains 6 sheets corresponding to Figure 6 panels "a-f" in linear order. Column definitions include: pvalue - the pvalue found when the variable was removed from the site by variable matrix and used to create the histograms; variable - the variable that was removed from the site by variable matrix. Variables can be bee species, landscape classes, or plants given by their unique common name. The final 3 datasets are titled: "Bloom_BetaDiverstiy_BeeSiteXSpeciesMatrix_DataFinal.csv", "Bloom_BetaDiverstiy_LandscapeSiteXClassMatrix_DataFinal.csv", and "Bloom_BetaDiverstiy_PlantSiteXCommonNameMatrix_DataFinal.csv." These datasets contain the matrices used to generate these bee, landscape, and plant beta diversity metrics used for our analysis.

  10. Data from: Does the Disclosure of Gun Ownership Affect Crime? Evidence from...

    • search.datacite.org
    • openicpsr.org
    • +1more
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Tannenbaum (2018). Does the Disclosure of Gun Ownership Affect Crime? Evidence from New York [Dataset]. http://doi.org/10.3886/e109802v1
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Daniel Tannenbaum
    Description

    This repository contains the data and code necessary to replicate all figures and tables in the working paper: "Does the disclosure of gun ownership affect crime? Evidence from New York" by Daniel Tannenbaum
    There are four folders in this repository:(1) Build: contains all the .do files required to produce the analysis datasets, using the raw data (i.e. datasets in the RawData folder).(2) Analysis: contains all the .do files required to produce all the figures and tables in the paper, using the analysis datasets (i.e. datasets in the AnalysisData folder).(3) RawData: contains all the raw datasets used to produce the AnalysisData datasets. The only raw dataset used in the paper that is excluded from this folder is the proprietary housing assessor and sales transaction data from DataQuick, owned by Corelogic. If I receive approval to include this raw data in this repository I will do so in future versions of this repository.(4) AnalysisData: contains all the analysis datasets that are created using the Build and are used to produce the tables and figures in the paper.

    Running the file Master_analysis.do in the Analysis folder will produce, in one script, all the tables and figures in the paper.

  11. Bitter Creek Analysis Pedigree Data

    • catalog.data.gov
    • s.cnmilf.com
    Updated Sep 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Bitter Creek Analysis Pedigree Data [Dataset]. https://catalog.data.gov/dataset/bitter-creek-analysis-pedigree-data
    Explore at:
    Dataset updated
    Sep 25, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These data sets contain raw and processed data used in for analyses, figures, and tables in the Region 8 Memo: Characterization of chloride and conductivity levels in the Bitter Creek Watershed, WY. However, these data may be used for other analyses alone or in combination with other or new data. These data were used to assess whether chloride levels are naturally high in streams in the Bitter Creek, WY watershed and how chloride concentrations expected to protect 95 percent of aquatic genera in these streams compare to Wyoming’s chloride criteria applicable to the Bitter Creek watershed. Owing to the arid conditions, background conductivity and chloride levels were characterized for surface flow and ground water flow conditions. Natural chloride levels were found to be less than current water quality criteria for Wyoming. Although the report was prepared for USEPA Region 8 and OST, Office of Water, the report will be of interest to the WDEQ, Sweetwater County Conservation District, and the regulated community. No formal metadata standard was used. Pedigree.xlsx contains: 1. NOTES: Description of work and other worksheets. 2. Pedigree_Summary: Source files used to create figures and tables. 3. DataFiles: Data files used in the R code for creating the figures and tables 4. R_Script: Summary of the R scripts. 5. DataDictionary: Data file titles in all data files Folders: _Datasets Data file uploaded to Environmental Dataset Gateway "A list of subfolders: _R: Clean R scripts used to generate document figures and tables _Tables_Figures: Files generated from R script and used in the Region 6 memo R Code and Data: All additional files used for this project, including original files, intermediate files, extra output files, and extra functions the ""_R"" folder stores R scripts for input and output files and an R project file.. Users can open the R project and run R scripts directly from the ""_R"" folder or the XC95 folder by installing R, RStudio, and associated R packages."

  12. EukProt: a database of genome-scale predicted proteins across the diversity...

    • figshare.com
    txt
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Richter; Cédric Berney; Jürgen Strassert; Fabien Burki; Colomban de Vargas (2022). EukProt: a database of genome-scale predicted proteins across the diversity of eukaryotic life [Dataset]. http://doi.org/10.6084/m9.figshare.12417881.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 21, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Daniel Richter; Cédric Berney; Jürgen Strassert; Fabien Burki; Colomban de Vargas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Version 1 (8 May, 2019)A database of published/publicly available predicted protein sets and unannotated genomes selected to represent eukaryotic diversity, including 708 species across all major supergroups (Amorphea, Archaeplastida, CRuMs, Cryptista, Discoba, Haptista, Hemimastigophora, Metamonada, TSAR) and orphan taxa (Ancyromonadida, Malawimonadidae, Picozoa) (Burki et al. 2019, DOI: 10.1016/j.tree.2019.08.008).EukProt_proteins.v01.2019_05_08.tgz: predicted protein sets, for 694 species with either a genome with predicted proteins (242 species) or a transcriptome (452 species).EukProt_unannotated_genomes.v01.2019_05_08.tgz: genomes, for 14 species with genomic data lacking predicted proteins (these are almost exclusively single-cell genomes).EukProt_assembled_transcriptomes.v01.2019_05_08.tgz: contigs, for 46 species with publicly available reads but no publicly available transcriptome assembly. The proteins predicted from these assemblies are included in the proteins file.EukProt_included_data_sets.v01.2019_05_08.txt and EukProt_not_included_data_sets.v01.2019_05_08.txt: tables of information on data sets either included or not included in the database. Tab-delimited; multiple entries in the same cell are comma-delimited; missing data is represented with the “N/A” value. With the following columns:EukProt_ID: the unique identifier associated with the data set. This will not change among versions. If a new data set becomes available for the species, it will be assigned a new unique identifier.Name_to_Use: the name of the species for protein/genome/assembled transcriptome files.Strain: the strain(s) of the species sequenced.Previous_Names: any previous names that this species was known by, not including cases where a species was originally assigned to a genus but not identified to the species level (e.g., Goniomonas sp., now identified as Goniomonas avonlea, is not listed as a previous name).Replaces_EukProt_ID/Replaced_by_EukProt_ID (included for forward compatibility): if the data set changes with respect to an earlier version, the EukProt ID of the data set that it replaces (in the included table) or that it is replaced by (in the not_included table).Genus_UniEuk, Epithet_UniEuk, Supergroup_UniEuk, Taxogroup_UniEuk: taxonomic identifiers at different levels of the UniEuk taxonomy (based on Adl et al. 2018, DOI: 10.1111/jeu.12691).Taxonomy_UniEuk: the full lineage of the species in the UniEuk taxonomy (semicolon-delimited).Merged_Strains: whether multiple strains of the same species were merged to create the data set.Data_Source_URL: the URL(s) from which the data were downloaded.Data_Source_Name: the name of the data set (as assigned by the data source).Paper_DOI: the DOI(s) of the paper(s) that published the data set.Actions_Prior_to_Use: the action(s) that were taken to process the publicly available files in order to produce the data set in this database, excluding genomes lacking annotations (these are provided as is, with the label ‘translated sequence search’ indicating that proteins of interest can be identified with translated sequence homology search software). Actions taken:‘assemble mRNA’: Trinity v. 2.8.4, http://trinityrnaseq.github.io/‘CD-HIT’: v. 4.6, http://weizhongli-lab.org/cd-hit/‘extractfeat’, ‘transeq’, ‘trimseq’: from EMBOSS package v. 6.6.0.0, http://emboss.sourceforge.net/‘translate mRNA’: Transdecoder v. 5.3.0, http://transdecoder.github.io/All parameter values were default, unless otherwise specified.Data_Source_Type: the type of the source data (possible types: EST, transcriptome, single-cell transcriptome, genome, single-cell genome).Notes: additional information on the data set (for example, why it was not included).

  13. n

    Data for: Predicting habitat suitability for Townsend’s big-eared bats...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natalie Hamilton; Michael Morrison; Leila Harris; Joseph Szewczak; Scott Osborn (2022). Data for: Predicting habitat suitability for Townsend’s big-eared bats across California in relation to climate change [Dataset]. http://doi.org/10.5061/dryad.4j0zpc8f1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 12, 2022
    Dataset provided by
    California Department of Fish and Wildlife
    Texas A&M University
    California State Polytechnic University
    University of California, Davis
    Authors
    Natalie Hamilton; Michael Morrison; Leila Harris; Joseph Szewczak; Scott Osborn
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    California
    Description

    Aim: Effective management decisions depend on knowledge of species distribution and habitat use. Maps generated from species distribution models are important in predicting previously unknown occurrences of protected species. However, if populations are seasonally dynamic or locally adapted, failing to consider population level differences could lead to erroneous determinations of occurrence probability and ineffective management. The study goal was to model the distribution of a species of special concern, Townsend’s big-eared bats (Corynorhinus townsendii), in California. We incorporate seasonal and spatial differences to estimate the distribution under current and future climate conditions. Methods: We built species distribution models using all records from statewide roost surveys and by subsetting data to seasonal colonies, representing different phenological stages, and to Environmental Protection Agency Level III Ecoregions to understand how environmental needs vary based on these factors. We projected species’ distribution for 2061-2080 in response to low and high emissions scenarios and calculated the expected range shifts. Results: The estimated distribution differed between the combined (full dataset) and phenologically-explicit models, while ecoregion-specific models were largely congruent with the combined model. Across the majority of models, precipitation was the most important variable predicting the presence of C. townsendii roosts. Under future climate scnearios, distribution of C. townsendii is expected to contract throughout the state, however suitable areas will expand within some ecoregions. Main conclusion: Comparison of phenologically-explicit models with combined models indicate the combined models better predict the extent of the known range of C. townsendii in California. However, life history-explicit models aid in understanding of different environmental needs and distribution of their major phenological stages. Differences between ecoregion-specific and statewide predictions of habitat contractions highlight the need to consider regional variation when forecasting species’ responses to climate change. These models can aid in directing seasonally explicit surveys and predicting regions most vulnerable under future climate conditions. Methods Study area and survey data The study area covers the U.S. state of California, which has steep environmental gradients that support an array of species (Dobrowski et al. 2011). Because California is ecologically diverse, with regions ranging from forested mountain ranges to deserts, we examined local environmental needs by modeling at both the state-wide and ecoregion scale, using U.S. Environmental Protection Agency (EPA) Level III ecoregion designations and there are thirteen Level III ecoregions in California (Table S1.1) (Griffith et al. 2016). Species occurrence data used in this study were from a statewide survey of C. townsendii in California conducted by Harris et al. (2019). Briefly, methods included field surveys from 2014-2017 following a modified bat survey protocol to create a stratified random sampling scheme. Corynorhinus townsendii presence at roost sites was based on visual bat sightings. From these survey efforts, we have visual occurrence data for 65 maternity roosts, 82 hibernation roosts (hibernacula), and 91 active-season non-maternity roosts (transition roosts) for a total of 238 occurrence records (Figure 1, Table S1.1). Ecogeographical factors We downloaded climatic variables from WorldClim 2.0 bioclimatic variables (Fick & Hijmans, 2017) at a resolution of 5 arcmin for broad-scale analysis and 30 arcsec for our ecoregion-specific analyses. To calculate elevation and slope, we used a digital elevation model (USGS 2022) in ArcGIS 10.8.1 (ESRI, 2006). The chosen set of environmental variables reflects knowledge on climatic conditions and habitat relevant to bat physiology, phenology, and life history (Rebelo et al. 2010, Razgour et al. 2011, Loeb and Winters 2013, Razgour 2015, Ancillotto et al. 2016). To trim the global environmental variables to the same extent (the state of California), we used the R package “raster” (Hijmans et al. 2022). We performed a correlation analysis on the raster layers using the “layerStats” function and removed variables with a Pearson’s coefficient > 0.7 (see Table 1 for final model variables). For future climate conditions, we selected three general circulation models (GCMs) based on previous species distribution models of temperate bat species (Razgour et al. 2019) [Hadley Centre Global Environment Model version 2 Earth Systems model (HadGEM3-GC31_LL; Webb, 2019), Institut Pierre-Simon Laplace Coupled Model 6th Assessment Low Resolution (IPSL-CM6A-LR; Boucher et al., 2018), and Max Planck Institute for Meteorology Earth System Model Low Resolution (MPI-ESM1-2-LR; Brovkin et al., 2019)] and two contrasting greenhouse concentration trajectories (Shared Socio-economic Pathways (SSPs): a steady decline pathway with CO2 concentrations of 360 ppmv (SSP1-2.6) and an increasing pathway with CO2 reaching around 2,000 ppmv (SSP5-8.5) (IPCC6). We modeled distribution for present conditions future (2061-2080) time periods. Because one aim of our study was to determine the consequences of changing climate, we changed only the climatic data when projecting future distributions, while keeping the other variables constant over time (elevation, slope). Species distribution modeling We generated distribution maps for total occurrences (maternity + hibernacula + transition, hereafter defined as “combined models”), maternity colonies , hibernacula, and transition roosts. To estimate the present and future habitat suitability for C. townsendii in California, we used the maximum entropy (MaxEnt) algorithm in the “dismo” R package (Hijmans et al. 2021) through the advanced computing resources provided by Texas A&M High Performance Research Computing. We chose MaxEnt to aid in the comparisons of state-wide and ecoregion-specific models as MaxEnt outperforms other approaches when using small datasets (as is the case in our ecoregion-specific models). We created 1,000 background points from random points in the environmental layers and performed a 5-fold cross validation approach, which divided the occurrence records into training (80%) and testing (20%) datasets. We assessed the performance of our models by measuring the area under the receiver operating characteristic curve (AUC; Hanley & McNeil, 1982), where values >0.5 indicate that the model is performing better than random, values 0.5-0.7 indicating poor performance, 0.7-0.9 moderate performance and values of 0.9-1 excellent performance (BCCVL, Hallgren et al., 2016). We also measured the maximum true skill statistic (TSS; Allouche, Tsoar, & Kadmon, 2006) to assess model performance. The maxTSS ranges from -1 to +1:values <0.4 indicate a model that performs no better than random, 0.4-0.55 indicates poor performance, (0.55-0.7) moderate performance, (0.7-0.85) good performance, and values >0.80 indicate excellent performance (Samadi et al. 2022). Final distribution maps were generated using all occurrence records for each region (rather than the training/testing subset), and the models were projected onto present and future climate conditions. Additionally, because the climatic conditions of the different ecoregions of California vary widely, we generated separate models for each ecoregion in an attempt to capture potential local effects of climate change. A general rule in species distribution modeling is that the occurrence points should be 10 times the number of predictors included in the model, meaning that we would need 50 occurrences in each ecoregion. One common way to overcome this limitation is through the ensemble of small models (ESMs) (Breiner et al. 2015., 2018; Virtanen et al. 2018; Scherrer et al. 2019; Song et al. 2019) included in ecospat R package (references). For our ESMs we implemented MaxEnt modeling, and the final ensemble model was created by averaging individual bivariate models by weighted performance (AUC > 0.5). We also used null model significance testing with to evaluate the performance of our ESMs (Raes and Ter Steege 2007). To perform null model testing we compared AUC scores from 100 null models using randomly generated presence locations equal to the number used in the developed distribution model. All ecoregion models outperformed the null expectation (p<0.002). Estimating range shifts For each of the three GCMs and each RCP scenario, we converted the probability distribution map into a binary map (0=unsuitable, 1=suitable) using the threshold that maximizes sensitivity and specificity (Liu et al. 2016). To create the final maps for each SSP scenario, we summed the three binary GCM layers and took a consensus approach, meaning climatically suitable areas were pixels where at least two of the three models predicted species presence (Araújo and New 2007, Piccioli Cappelli et al. 2021). We combined the future binary maps (fmap) and the present binary maps (pmap) following the formula fmap x 2 + pmap (from Huang et al., 2017) to produce maps with values of 0 (areas not suitable), 1 (areas that are suitable in the present but not the future), 2 (areas that are not suitable in the present but suitable in the future), and 3 (areas currently suitable that will remain suitable) using the raster calculator function in QGIS. We then calculated the total area of suitability, area of maintenance, area of expansion, and area of contraction for each binary model using the “BIOMOD_RangeSize” function in R package “biomod2” (Thuiller et al. 2021).

  14. Los Angeles Family and Neighborhood Survey (L.A.FANS), Wave 1, Restricted...

    • icpsr.umich.edu
    Updated Apr 8, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pebley, Anne R.; Sastry, Narayan (2019). Los Angeles Family and Neighborhood Survey (L.A.FANS), Wave 1, Restricted Data Version 3, 2000-2001 [Dataset]. http://doi.org/10.3886/ICPSR37271.v1
    Explore at:
    Dataset updated
    Apr 8, 2019
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Pebley, Anne R.; Sastry, Narayan
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/37271/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37271/terms

    Time period covered
    2000 - 2001
    Area covered
    Los Angeles, United States, California
    Description

    This study includes restricted data version 3, for Wave 1 of the L.A.FANS data. To compare L.A.FANS restricted data, version 3 with other restricted data versions, see the table on the series page for the L.A.FANS data here. Data in this study are designed for use with the public use data files for L.A.FANS, Wave 1 (study 1). This file adds only a few variables to the L.A.FANS, Wave 1 public use files. Specifically, it adds the census tract and block number for the tract each respondent lives in and geographic coordinates data for a number of locations reported by the respondent (including home, grocery store, place of work, place of worship, schools, etc.). It also includes certain variables, thought to be sensitive, which are not available in the public use data. These variables are identified in the L.A.FANS Wave 1 Users Guide and Codebook. Finally, some distance variables and individual characteristics which are treated in the public use data to make it harder to identify individuals are provided in an untreated form in the Version 3 restricted data file. Please note that L.A. FANS restricted data may only be accessed within the ICPSR Virtual Data Enclave (VDE) and must be merged with the L.A. FANS public data prior to beginning any analysis. A Users' Guide which explains the design and how to use the samples are available for Wave 1 at the RAND website. Additional information on the project, survey design, sample, and variables are available from: Sastry, Narayan, Bonnie Ghosh-Dastidar, John Adams, and Anne R. Pebley (2006). The Design of a Multilevel Survey of Children, Families, and Communities: The Los Angeles Family and Neighborhood Survey, Social Science Research, Volume 35, Number 4, Pages 1000-1024 The Users' Guides (Wave 1 and Wave 2) RAND Documentation Reports page

  15. m

    Dataset - Performance Comparison Oracle, PostgreSQL, and MySQL Database...

    • data.mendeley.com
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raymond Setiawan (2024). Dataset - Performance Comparison Oracle, PostgreSQL, and MySQL Database Using JMeter Tools [Dataset]. http://doi.org/10.17632/f6vfr96m2p.1
    Explore at:
    Dataset updated
    Sep 16, 2024
    Authors
    Raymond Setiawan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In order to compare the performance of various Database Management Systems (DBMS), five primary tables—customer, salesman, category, qty_product, and product—were used to create an extensive test dataset. This data was then stored in two principal tables, transaction_hdr and transaction_dtl, each containing over 100,000 records. The utilization of this large dataset allows for a thorough evaluation of DBMS performance using tools such as JMeter.

    Tables Used: 1. Customer: Stores information about customers, including customer ID, name, contact details, and address. 2. Salesman: Contains data about sales personnel, including salesman ID, name, contact details, and address. 3. Category_Product: Classifies products into specific categories, including category ID and category type. 4.Qty_Product: Maintains information regarding the quantity of products available, including quantity ID, quantity name, and quantity value. 5. Product: Details information about products, including product ID, product name, category ID, size, quantity ID, stock, and price.

    By utilizing the above tables, a substantial dataset was generated by populating the transaction_hdr and transaction_dtl tables with over 100,000 records each. The transaction_hdr table includes transaction headers with information such as transaction ID, date, customer ID, salesman ID, and total price. The transaction_dtl table records the details of each transaction, including transaction ID, product ID, product name, category ID, quantity ID, quantity value, quantity, and product price. JMeter was employed to conduct performance testing on the DBMS using this dataset to assess throughput and response time across MySQL, Oracle, and PostgreSQL databases.

  16. n

    Data from: Accommodating the role of site memory in dynamic species...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated May 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graziella DiRenzo; David Miller; Blake Hossack; Brent Sigafus; Paige Howell; Erin Muths; Evan Grant (2021). Accommodating the role of site memory in dynamic species distribution models [Dataset]. http://doi.org/10.5061/dryad.vdncjsxs7
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 3, 2021
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Pennsylvania State University
    Authors
    Graziella DiRenzo; David Miller; Blake Hossack; Brent Sigafus; Paige Howell; Erin Muths; Evan Grant
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    First-order dynamic occupancy models (FODOMs) are a class of state-space model in which the true state (occurrence) is observed imperfectly. An important assumption of FODOMs is that site dynamics only depend on the current state and that variations in dynamic processes are adequately captured with covariates or random effects. However, it is often difficult to understand and/or measure the covariates that generate ecological data, which are typically spatio-temporally correlated. Consequently, the non-independent error structure of correlated data causes underestimation of parameter uncertainty and poor ecological inference. Here, we extend the FODOM framework with a second-order Markov process to accommodate site memory when covariates are not available. Our modeling framework can be used to make reliable inference about site occupancy, colonization, extinction, turnover, and detection probabilities. We present a series of simulations to illustrate the data requirements and model performance. We then applied our modeling framework to 13 years of data from an amphibian community in southern Arizona, USA. In this analysis, we found residual temporal autocorrelation of population processes for most species, even after accounting for long-term drought dynamics. Our approach represents a valuable advance in obtaining inference on population dynamics, especially as they relate to metapopulations.

    Methods

    These files were written by: G. V. DiRenzo

    If you have any questions, please email: grace.direnzo@gmail.com

    This repository provides the code, data, and simulations to recreate all of the analysis, tables, and figures presented in the manuscript.

    In this file, we direct the user to the location of files.

    All methods can be found in the manuscript and associated supplements.

    All file paths direct the user in navigating the files in this repo.

    ######## Objective & Table of contents

    File objectives & Table of contents:

    # 1. To navigate to files explaining how to simulate and analyze data using the main text parameterization
    # 2. To navigate to files explaining how to simulate and analyze data using the alternative parameterization (hidden Markov model)
    # 3. To navigate to files that created the parameter combinations for the simulation studies
    # 4. To navigate to files used to run scenarios in the manuscript
      # 4a. Scenario 1: data generated without site memory & without site heterogenity
      # 4b. Scenario 2: data generated with site memory & without site heterogenity
      # 4c. Scenario 3: data generated with site memory & with site heterogenity
    # 5. To navigate to files for general sample design guidelines
    # 6. Parameter accuracy, precision, and bias under different parameter combinations
    # 7. Model comparison under different scenarios
    # 8. To specifically navigate to code that recreates manuscript:
      # 8a. Figures
      # 8b. Tables
    # 9. To navigate to files for empirical analysis
    
    ### 1. Main text parameterization

    To see model parameterization as written in the main text, please navigate to: /MemModel/OtherCode/MemoryMod_main.R

    ### 2. Alternative parameterization

    To see alternative parameterization using a Hidden Markov Model, please navigate to: /MemModel/OtherCode/MemoryMod_HMM.R

    ### 3. Parameter Combinations

    To see how parameter combinations were generated, please navigate to: /MemModel/ParameterCombinations/LHS_parameter_combos.R

    To see stored parameter combinations for simulations, please navigate to: /MemModel/ParameterCombinations/parameter_combos_MemModel4.csv

    ### 4a. Scenario #1

    To simulate data WITHOUT memory and analyze using: - memory model & - first-order dynamic occupancy model

    Please navigate to: /MemModel/Simulations/withoutMem/Code/ MemoryMod_JobArray_withoutMem.R = code to simulate & analyze data MemoryMod_JA1.sh = file to run simulations 1-5000 on HPC MemoryMod_JA2.sh = file to run simulations 5001-10000 on HPC

    All model output is stored in: /MemModel/Simulations/withoutMem/ModelOutput

    ### 4b. Scenario #2

    To simulate data WITH memory and analyze using: - memory model & - first-order dynamic occupancy model

    Please navigate to: /MemModel/Simulations/withMem/Code/ MemoryMod_JobArray_withMem.R = code to simulate & analyze data MemoryMod_JA1.sh = file to run simulations 1-5000 on HPC MemoryMod_JA2.sh = file to run simulations 5001-10000 on HPC

    All model output is stored in: /MemModel/Simulations/withMem/ModelOutput

    ### 4c. Scenario #3

    To simulate data WITH memory and WITH site heterogenity- analyze using: - memory model & - first-order dynamic occupancy model

    Please navigate to: /MemModel/Simulations/Hetero/Code/ MemoryMod_JobArray_Hetero.R = code to simulate & analyze data MemoryMod_JA1.sh = file to run simulations 1-5000 on HPC MemoryMod_JA2.sh = file to run simulations 5001-10000 on HPC

    All model output is stored in: /MemModel/Simulations/Hetero/ModelOutput

    ### 5. General sample design guidelines

    To see methods for the general sample design guidelines, please navigate to: /MemModel/PostProcessingCode/Sampling_design_guidelines.R

    ### 6. Parameter accuracy, precision, and bias under different parameter combinations

    To see methods for model performance under different parameter combinations, please navigate to: /MemModel/PostProcessingCode/Parameter_precison_accuracy_bias.R

    ### 7. Comparison of model performance

    To see methods for model comparison, please navigate to: /MemModel/PostProcessingCode/ModelComparison.R

    ### 8a. Manuscript Figures

    To create parts of Figure 1 of main text (case study): - Fig 1D & 1E: /MemModel/EmpiricalAnalysis/Code/Analysis/AZ_CaseStudy.R

    To create Figure 2 of main text (Comparison across simulation scenarios): - /MemModel/PostProcessingCode/ModelComparison.R

    To create Figure S1, S2, & S3 use file: - /MemModel/PostProcessingCode/Parameter_precison_accuracy_bias.R

    To create Figure S4 & S5 use file: - /MemModel/PostProcessingCode/ModelComparison.R

    ### 8b. Manuscript Tables

    To create Table 1 of main text (General sampling recommendations): - /MemModel/PostProcessingCode/Sampling_design_guidelines.R

    To create Table S1: - /MemModel/PostProcessingCode/Parameter_precison_accuracy_bias.R

    To create Table S2: - /MemModel/EmpiricalAnalysis/Code/Analysis/AZ_CaseStudy.R

    To create Table S3: - /MemModel/PostProcessingCode/ModelComparison.R

    To create Table S4 & S5: - /MemModel/EmpiricalAnalysis/Code/Analysis/AZ_CaseStudy.R

    ### 9. Empirical analysis

    To recreate the empirical analysis of the case study, please navigate to: - /MemModel/EmpiricalAnalysis/Code/Analysis/AZ_CaseStudy.R

  17. Myket Android Application Install Dataset

    • zenodo.org
    bin, csv
    Updated Aug 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erfan Loghmani; MohammadAmin Fazli; Erfan Loghmani; MohammadAmin Fazli (2023). Myket Android Application Install Dataset [Dataset]. http://doi.org/10.48550/arxiv.2308.06862
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Aug 23, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erfan Loghmani; MohammadAmin Fazli; Erfan Loghmani; MohammadAmin Fazli
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

    Data Creation

    The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.

    We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.

    Data Structure

    The dataset has two main files.

    • myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero.
    • app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.

    Dataset Details

    • Total Instances: 694,121 install interaction instances
    • Instances Format: Triplets of user_id, app_name, timestamp
    • 10,000 users and 7,988 android applications
    • Item features for 7,606 applications

    For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.

    Top 20 Most Installed Applications

    Package NameCount of Interactions
    com.instagram.android15292
    ir.resaneh1.iptv12143
    com.tencent.ig7919
    com.ForgeGames.SpecialForcesGroup27797
    ir.nomogame.ClutchGame6193
    com.dts.freefireth6041
    com.whatsapp5876
    com.supercell.clashofclans5817
    com.mojang.minecraftpe5649
    com.lenovo.anyshare.gps5076
    ir.medu.shad4673
    com.firsttouchgames.dls34641
    com.activision.callofduty.shooter4357
    com.tencent.iglite4126
    com.aparat3598
    com.kiloo.subwaysurf3135
    com.supercell.clashroyale2793
    co.palang.QuizOfKings2589
    com.nazdika.app2436
    com.digikala2413

    Comparison with SNAP Datasets

    The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:

    Dataset#Users#Items#InteractionsAverage Interactions per UserAverage Unique Items per User
    Myket10,0007,988694,12169.454.6
    LastFM9801,0001,293,1031,319.5158.2
    Reddit10,000984672,44767.27.9
    Wikipedia8,2271,000157,47419.12.2
    MOOC7,04797411,74958.425.3

    The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.

    Citation

    If you use this dataset in your research, please cite the following preprint:

    @misc{loghmani2023effect,
       title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, 
       author={Erfan Loghmani and MohammadAmin Fazli},
       year={2023},
       eprint={2308.06862},
       archivePrefix={arXiv},
       primaryClass={cs.LG}
    }
    
  18. COVID-19 Case Surveillance Public Use Data

    • data.cdc.gov
    • opendatalab.com
    • +5more
    application/rdfxml +5
    Updated Jul 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
    Explore at:
    application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
    Dataset updated
    Jul 9, 2024
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    CDC Data, Analytics and Visualization Task Force
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

    Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

    This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

    CDC has three COVID-19 case surveillance datasets:

    The following apply to all three datasets:

    Overview

    The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

    For more information: NNDSS Supports the COVID-19 Response | CDC.

    The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

    COVID-19 Case Reports

    COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

    All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

    Data are Considered Provisional

    • The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
    • Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
    • Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

    Data Limitations

    To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

    Data Quality Assurance Procedures

    CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:

    • Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
    • Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
    • Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

    Data Suppression

    To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

    For questions, please contact Ask SRRG (eocevent394@cdc.gov).

    Additional COVID-19 Data

    COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These

  19. e

    Analysis of the Neighborhood Parameter on Outlier Detection Algorithms -...

    • b2find.eudat.eu
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Analysis of the Neighborhood Parameter on Outlier Detection Algorithms - Evaluation Tests - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/97061c16-018f-5d82-9125-2217026d9480
    Explore at:
    Dataset updated
    Nov 21, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of the Neighborhood Parameter on Outlier Detection Algorithms - Evaluation Tests conducted for the paper: Impact of the Neighborhood Parameter on Outlier Detection Algorithms by F. Iglesias, C. Martínez, T. Zseby Context and methodology A significant number of anomaly detection algorithms base their distance and density estimates on neighborhood parameters (usually referred to as k). The experiments in this repository analyze how five different SoTA algorithms (kNN, LOF, LooP, ABOD and SDO) are affected by variations in k in combination with different alterations that the data may undergo in relation to: cardinality, dimensionality, global outlier ratio, local outlier ratio, layers of density, inliers-outliers density ratio, and zonification. Evaluations are conducted with accuracy measurements (ROC-AUC, adjusted Average Precision, and Precision at n) and runtimes. This repository is framed within the research on the following domains: algorithm evaluation, outlier detection, anomaly detection, unsupervised learning, machine learning, data mining, data analysis. Datasets and algorithms can be used for experiment replication and for further evaluation and comparison. Technical details Experiments are in Python 3 (tested with v3.9.6). Provided scripts generate all data and results. We keep them in the repo for the sake of comparability and replicability. The file and folder structure is as follows: results_datasets_scores.zip contains all results and plots as shown in the paper, also the generated datasets and files with anomaly dependencies.sh for installing required Python packages in a clean environment. generate_data.py creates experimental datasets. outdet.py runs outlier detection with ABOD, kNN, LOF, LoOP and SDO over the collection of datasets. indices.py contains functions implementing accuracy indices. explore_results.py parses results obtained with outlier detection algorithms to create comparison plots and a table with optimal ks. test_kfc.py rusn KFC tests for finding the optimal k in a collection of datasets. It requires kfc.py, which is not included in this repo and must be downloaded from https://github.com/TimeIsAFriend/KFC. kfc.py implements the KFCS and KFCR methods for finding the optimal k as presented in: [1] explore_kfc.py parses results obtained with KFCS and KFCR methods to create latex tables. README.md provides explanations and step by step instructions for replication. References [1] Jiawei Yang, Xu Tan, Sylwan Rahardja, Outlier detection: How to Select k for k-nearest-neighbors-based outlier detectors, Pattern Recognition Letters, Volume 174, 2023, Pages 112-117, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2023.08.020. License The CC-BY license applies to all data generated with the "generate_data.py" script. All distributed code is under the GNU GPL license.

  20. u

    Data from: Enabling proteomic studies with RNA-Seq: the proteome of tomato...

    • agdatacommons.nal.usda.gov
    xls
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gloria Lopez-Casado; Paul A. Covey; Patricia A. Bedinger; Lukas A. Mueller; Theodore W. Thannhauser; Sheng Zhang; Zhangjun Fei; James J. Giovannoni; Jocelyn K.C. Rose (2025). Data from: Enabling proteomic studies with RNA-Seq: the proteome of tomato pollen as a test case [Dataset]. http://doi.org/10.15482/USDA.ADC/1177461
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    PROTEOMICS
    Authors
    Gloria Lopez-Casado; Paul A. Covey; Patricia A. Bedinger; Lukas A. Mueller; Theodore W. Thannhauser; Sheng Zhang; Zhangjun Fei; James J. Giovannoni; Jocelyn K.C. Rose
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The data set included here is the supporting material to the article, "Enabling proteomic studies with RNA-Seq: The proteome of tomato pollen as a test case." 2012. Researchers conducted a quantitative analysis of the proteomes of pollen from domesticated tomato (Solanum lycopersicum) and two wild relatives that exhibit differences in mating systems and in inter-specific reproductive barriers. Using a custom tomato RNA-Seq database created through 454 pyrosequencing, more than 1200 proteins were identified, with subsets showing expression differences between genotypes or in the accumulation of the corresponding transcripts. Importantly, no major qualitative or quantitative differences were observed in the characterized proteomes when mass spectra were used to interrogate either a highly curated community database of tomato DNA sequences generated through traditional sequencing technologies, or the RNA-Seq database. They conclude that RNA-Seq provides a cost-effective and robust platform for protein identification and will be increasingly valuable to the field of proteomics. This dataset consists of 10 tables containing protein, peptides, and spectral data. Resources in this dataset:Resource Title: Lists and classification of tomato pollen proteins from 454 experiments: Tables S1 to S10 (.xls file). File Name: pmic7005-sup-0001-tables1.xlsResource Description: This spreadsheet, 'pmic7005-sup-0001-tables1.xls" contains the ten tables described at the publisher's site. At the publisher's site, it is linked ten times, once for each table. It is listed once here in this record. List of tables:

    Table S1. Statistical analyses related to average length of RNA sequences obtained from 454 sequencing experiments performed to create the different 454 databases

    Table S2: Raw Data from peptides identified in the 3 replicates (R1, R2, R3) using the 3 databases SGN, 454db1, and 454db2. S2 a: SGNdb_R1; S2b: SGNdb_R2; S2 c: SGNdb_R3; S2 d: 454db1_R1; S2e: 454db1_R2; S2f: 454db1_R3; S2 g: 454db2_R1; S2 h: 454db2_R2; S21: 454db2_R3. (file “LopezCasado_Proteomics_SuppTable_S2” at ftp://ted.bti.cornell.edu/pub/tomato_454_unigene)

    Table S3. Numbers of proteins, peptide, and identified spectra in tomato pollen proteome using SGN db and 454 generated databases (454 db1 and 454 db2), considering biological modifications (bm) as an option when performing the search against specific database. R1, R2, and R3 indicate three independent biological replicates

    Table S4. Table S5. Number of proteins, distinct peptides, and spectral identified at three false discovery rate (FDR) level

    Table S5. Functional classification of tomato pollen proteins using iTRAQ and SGN db interrogation

    Table S6. Functional classification of tomato pollen proteins using iTRAQ and 454 db2 interrogation

    Table S7. Proteins differentially expressed in one specific genotype and found in one or more biological replicates using SGN db

    Table S8. Proteins differentially expressed in one specific genotype and found in one or more biological replicates using 454 db2

    Table S9. Proteins differentially expressed in any specific genotype, identified using both SGN db and 454 db2. Each protein showed a >2-fold expression (p-value 2-fold expression (p < 0.05) in one genotype compared with one other genotype, and was detected in at least two of the three iTRAQ experiment replicates. Ratio refers to relative abundance between the genotypes indicated. SD, standard deviation

    Note: tables S2, S3, & S10 are missing from this and the source version of the spreadsheet, the contact has acknowledged this . 04/2015 Resource Title: Data Dictionary - Lists and classification of tomato pollen proteins from 454 experiments. File Name: Data Dictionary - data enabling proteomic studies rna seq.csv

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Cancer Imaging Archive, Standardization in Quantitative Imaging: A Multi-center Comparison of Radiomic Feature Values [Dataset]. http://doi.org/10.7937/tcia.2020.9era-gg29

Standardization in Quantitative Imaging: A Multi-center Comparison of Radiomic Feature Values

Radiomic-Feature-Standards

Explore at:
xlsx, n/a, nifti and zipAvailable download formats
Dataset authored and provided by
The Cancer Imaging Archive
License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered
Jun 9, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description

This dataset was used by the NCI's Quantitative Imaging Network (QIN) PET-CT Subgroup for their project titled: Multi-center Comparison of Radiomic Features from Different Software Packages on Digital Reference Objects and Patient Datasets. The purpose of this project was to assess the agreement among radiomic features when computed by several groups by using different software packages under very tightly controlled conditions, which included common image data sets and standardized feature definitions. The image datasets (and Volumes of Interest – VOIs) provided here are the same ones used in that project and reported in the publication listed below (ISSN 2379-1381 https://doi.org/10.18383/j.tom.2019.00031). In addition, we have provided detailed information about the software packages used (Table 1 in that publication) as well as the individual feature value results for each image dataset and each software package that was used to create the summary tables (Tables 2, 3 and 4) in that publication. For that project, nine common quantitative imaging features were selected for comparison including features that describe morphology, intensity, shape, and texture and that are described in detail in the International Biomarker Standardisation Initiative (IBSI, https://arxiv.org/abs/1612.07003 and publication (Zwanenburg A. Vallières M, et al, The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020 May;295(2):328-338. doi: https://doi.org/10.1148/radiol.2020191145). There are three datasets provided – two image datasets and one dataset consisting of four excel spreadsheets containing feature values.

  1. The first image dataset is a set of three Digital Reference Objects (DROs) used in the project, which are: (a) a sphere with uniform intensity, (b) a sphere with intensity variation (c) a nonspherical (but mathematically defined) object with uniform intensity. These DROs were created by the team at Stanford University and are described in (Jaggi A, Mattonen SA, McNitt-Gray M, Napel S. Stanford DRO Toolkit: digital reference objects for standardization of radiomic features. Tomography. 2019;6:–.) and are a subset of the DROs described in DRO Toolkit. Each DRO is represented in both DICOM and NIfTI format and the VOI was provided in each format as well (DICOM Segmentation Object (DSO) as well as NIfTI segmentation boundary).
  2. The second image dataset is the set of 10 patient CT scans, originating from the LIDC-IDRI dataset, that were used in the QIN multi-site collection of Lung CT data with Nodule Segmentations project ( https://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7 ). In that QIN study, a single lesion from each case was identified for analysis and then nine VOIs were generated using three repeat runs of three segmentation algorithms (one from each of three academic institutions) on each lesion. To eliminate one source of variability in our project, only one of the VOIs previously created for each lesion was identified and all sites used that same VOI definition. The specific VOI chosen for each lesion was the first run of the first algorithm (algorithm 1, run 1). DICOM images were provided for each dataset and the VOI was provided in both DICOM Segmentation Object (DSO) and NIfTI segmentation formats.
  3. The third dataset is a collection of four excel spreadsheets, each of which contains detailed information corresponding to each of the four tables in the publication. For example, the raw feature values and the summary tables for Tables 2,3 and 4 reported in the publication cited (https://doi.org/10.18383/j.tom.2019.00031). These tables are:
Software Package details : This table contains detailed information about the software packages used in the study (and listed in Table 1 in the publication) including version number and any parameters specified in the calculation of the features reported. DRO results : This contains the original feature values obtained for each software package for each DRO as well as the table summarizing results across software packages (Table 2 in the publication) . Patient Dataset results: This contains the original feature values for each software package for each patient dataset (1 lesion per case) as well as the table summarizing results across software packages and patient datasets (Table 3 in the publication). Harmonized GLCM Entropy Results : This contains the values for the “Harmonized” GLCM Entropy feature for each patient dataset and each software package as well as the summary across software packages (Table 4 in the publication).

Search
Clear search
Close search
Google apps
Main menu