100+ datasets found
  1. h

    DANI

    • huggingface.co
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu (2025). DANI [Dataset]. https://huggingface.co/datasets/Renyang/DANI
    Explore at:
    Dataset updated
    Aug 2, 2025
    Authors
    Liu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    DANI: Discrepancy Assessing for Natural and AI Images

    Paper: D-Judge: How Far Are We? Evaluating the Discrepancies Between AI-synthesized Images and Natural Images through Multimodal Guidance Code: https://github.com/RenyangLiu/DJudge A Large-Scale Dataset for Visual Research on AI-Synthesized and Natural Images

      Overview
    

    DANI (Discrepancy Assessing for Natural and AI Images) is a large-scale, multimodal dataset for benchmarking and broad visual research on both… See the full description on the dataset page: https://huggingface.co/datasets/Renyang/DANI.

  2. Dimensions of uncertainty: a spatiotemporal review of five COVID-19 datasets...

    • tandf.figshare.com
    • datasetcatalog.nlm.nih.gov
    rtf
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Halpern; Qinyun Lin; Ryan Wang; Stephanie Yang; Steve Goldstein; Marynia Kolak (2024). Dimensions of uncertainty: a spatiotemporal review of five COVID-19 datasets [Dataset]. http://doi.org/10.6084/m9.figshare.16866907.v1
    Explore at:
    rtfAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Dylan Halpern; Qinyun Lin; Ryan Wang; Stephanie Yang; Steve Goldstein; Marynia Kolak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COVID-19 surveillance across the United States is essential to tracking and mitigating the pandemic, but data representing cases and deaths may be impacted by attribute, spatial, and temporal uncertainties. COVID-19 case and death data are essential to understanding the pandemic and serve as key inputs for prediction models that inform policy-decisions; consistent information across datasets is critical to ensuring coherent findings. We implement an exploratory data analytic approach to characterize, synthesize, and visualize spatial-temporal dimensions of uncertainty across commonly used datasets for case and death metrics (Johns Hopkins University, the New York Times, USAFacts, and 1Point3Acres). We scrutinize data consistency to assess where and when disagreements occur, potentially indicating underlying uncertainty. We observe differences in cumulative case and death rates to highlight discrepancies and identify spatial patterns. Data are assessed using pairwise agreement (Cohen’s kappa) and agreement across all datasets (Fleiss’ kappa) to summarize changes over time. Findings suggest highest agreements between CDC, JHU, and NYT datasets. We find nine discrete type-components of information uncertainty for COVID-19 datasets reflecting various complex processes. Understanding processes and indicators of uncertainty in COVID-19 data reporting is especially relevant to public health professionals and policymakers to accurately understand and communicate information about the pandemic.

  3. Similarities Experimental Dataset

    • kaggle.com
    zip
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Hossan R. (2024). Similarities Experimental Dataset [Dataset]. https://www.kaggle.com/datasets/mdhossanr/experimental-file
    Explore at:
    zip(266144 bytes)Available download formats
    Dataset updated
    Jun 21, 2024
    Authors
    Md Hossan R.
    Description

    The datasets are grenerated during "Classification with an Academic Success Dataset (Playground Series S-4 E-6)" [https://www.kaggle.com/competitions/playground-series-s4e6/code] competition as submission file. Later on, I have used these datasets to find out similarities between these two datasets which can be found in this notebook [https://www.kaggle.com/code/mdhossanr/similarities-2-csv-files]

    It would be great if you also contribute in the notebook :) Happy Kaggling :)

  4. Social Media Engagement Report

    • kaggle.com
    zip
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2024). Social Media Engagement Report [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/social-media-engagement-report
    Explore at:
    zip(49114657 bytes)Available download formats
    Dataset updated
    Apr 13, 2024
    Authors
    Ali Reda Elblgihy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    *****Documentation Process***** 1. Data Preparation: - Upload the data into Power Query to assess quality and identify duplicate values, if any. - Verify data quality and types for each column, addressing any miswriting or inconsistencies. 2. Data Management: - Duplicate the original data sheet for future reference and label the new sheet as the "Working File" to preserve the integrity of the original dataset. 3. Understanding Metrics: - Clarify the meaning of column headers, particularly distinguishing between Impressions and Reach, and comprehend how Engagement Rate is calculated. - Engagement Rate formula: Total likes, comments, and shares divided by Reach. 4. Data Integrity Assurance: - Recognize that Impressions should outnumber Reach, reflecting total views versus unique audience size. - Investigate discrepancies between Reach and Impressions to ensure data integrity, identifying and resolving root causes for accurate reporting and analysis. 5. Data Correction: - Collaborate with the relevant team to rectify data inaccuracies, specifically addressing the discrepancy between Impressions and Reach. - Engage with the concerned team to understand the root cause of discrepancies between Impressions and Reach. - Identify instances where Impressions surpass Reach, potentially attributable to data transformation errors. - Following the rectification process, meticulously adjust the dataset to reflect the corrected Impressions and Reach values accurately. - Ensure diligent implementation of the corrections to maintain the integrity and reliability of the data. - Conduct a thorough recalculation of the Engagement Rate post-correction, adhering to rigorous data integrity standards to uphold the credibility of the analysis. 6. Data Enhancement: - Categorize Audience Age into three groups: "Senior Adults" (45+ years), "Mature Adults" (31-45 years), and "Adolescent Adults" (<30 years) within a new column named "Age Group." - Split date and time into separate columns using the text-to-columns option for improved analysis. 7. Temporal Analysis: - Introduce a new column for "Weekend and Weekday," renamed as "Weekday Type," to discern patterns and trends in engagement. - Define time periods by categorizing into "Morning," "Afternoon," "Evening," and "Night" based on time intervals. 8. Sentiment Analysis: - Populate blank cells in the Sentiment column with "Mixed Sentiment," denoting content containing both positive and negative sentiments or ambiguity. 9. Geographical Analysis: - Group countries and obtain additional continent data from an online source (e.g., https://statisticstimes.com/geography/countries-by-continents.php). - Add a new column for "Audience Continent" and utilize XLOOKUP function to retrieve corresponding continent data.

    *****Drawing Conclusions and Providing a Summary*****

    • The data is equally distributed across different categories, platforms, and over the years.
    • Most of our audience comprises senior adults (aged 45 and above).
    • Most of our audience exhibit mixed sentiments about our posts. However, an equal portion expresses consistent sentiments.
    • The majority of our posts were located in Africa.
    • The number of posts increased from the first year to the second year and remained relatively consistent for the third year.
    • The optimal time for posting is during the night on weekdays.
    • The highest engagement rates were observed in Croatia then Malawi.
    • The number of posts targeting senior adults is significantly higher than the other two categories. However, the engagement rates for mature and adolescent adults are also noteworthy, based on the number of targeted posts.
  5. h

    Img-Diff

    • huggingface.co
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Juicer (2024). Img-Diff [Dataset]. https://huggingface.co/datasets/datajuicer/Img-Diff
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    Data-Juicer
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

    We release Img-Diff, A high-quality synthesis dataset focusing on describing object differences for MLLMs. See more details in our paper and code.

    Abstract: High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning and… See the full description on the dataset page: https://huggingface.co/datasets/datajuicer/Img-Diff.

  6. Data from: Natal experience and pre-breeding environmental conditions affect...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    rtf
    Updated Sep 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph B. Burant; Eric Heisey; Nathaniel T. Wheelwright; Amy E. M. Newman; Shannon Whelan; Daniel J. Mennill; Stéphanie M. Doucet; Greg W. Mitchell; Bradley K. Woodworth; D. Ryan Norris (2021). Data from: Natal experience and pre-breeding environmental conditions affect lay date plasticity in Savannah sparrows [Dataset]. http://doi.org/10.6084/m9.figshare.14104829.v1
    Explore at:
    rtfAvailable download formats
    Dataset updated
    Sep 7, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Joseph B. Burant; Eric Heisey; Nathaniel T. Wheelwright; Amy E. M. Newman; Shannon Whelan; Daniel J. Mennill; Stéphanie M. Doucet; Greg W. Mitchell; Bradley K. Woodworth; D. Ryan Norris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SummaryThe repository includes the data and R script for performing an analysis of among- and within-individual differences in the timing of first nesting attempts of the year in natal and pre-breeding environmental conditions (see reference). The data come from a long-term study of the demography of Savannah sparrows (Passerculus sandwichensis) breeding on Kent Island, New Brunswick, Canada (44.58°N, 66.76°W). Climate data were taken from an Environment and Climate Change Canada weather station at the airport in Saint John, NB (45.32°N, 65.89°W; https://www.climate.weather.gc.ca)Datasets(1) SAVS_all_nests_samp.csv: contains summary information for all nest attempts observed for all females included in the analysis (i.e., including both first-of-year and subsequent lay dates).(2) SAVS_first_nest_per_year_samp.csv: contains detailed information on the first nesting attempt by each female Savannah sparrow monitored in the population over the course of the study (1987-2019, excluding the years 2005-2007; see Methods: Study site and field sampling in reference).(3) mean_daily_temperature.csv: contains mean daily temperature records from the ECCC weather station at Saint John, NB (see above). These mean daily temperatures were used in a climate sensitivity analysis to determine the optimum pre-breeding window on Kent Island.(4) SAVS_annual_summary.csv: contains annual summaries of average lay dates, breeding density, reproductive output, etc.Variables- female.id = factor; unique aluminum band number (USGS or Canadian Wildlife Service) assigned to each female- rain.categorical = binary (0 = low rainfall; 1 = high rainfall); groups females into low (81-171 mm) and high (172-378 mm) natal rainfall groups, based on the natal environmental conditions observed in each year (see Methods: Statistical analysis in reference)- year = integer (1987-2019); study year. The population on Savannah sparrows on Kent Island has been monitored since 1987 (excluding three years, 2005-2007)- nest.id = factor; an alpha-numeric code assigned to each nest; unique within years (the combination of year and nest.id would create a unique identifier for each nest)- fledglings = integer; number of offspring fledged from a nest- total.fledglings = integer; the total number of fledglings reared by a given female over the course of her lifetime- nest.attempts = integer; the total number of nest attempts per female (the number of nests over which the total number of fledglings is divided; includes both successful and unsuccessful clutches)hatch.yday = integer; day of the year on which the first egg hatched in a given nestlay.ydate = integer; day of the year on which the first egg was laid in a given nestlay.caldate = date (dd/mm/yyyy); calendar date on which the first egg in a given nest was laidnestling.year = integer; the year in which the female/mother of a given nest was born- nestling.density = integer; the density of adult breeders in the year in which a given female (associated with a particular nest) was born- total.nestling.rain = numeric; cumulative rainfall (in mm) experienced by a female during the nestling period in her natal year of life (01 June to 31 July; see Methods: Temperature and precipitation data in reference)- years.experience = integer; number of previous breeding years per female in a particular year- density.total = integer; total number of adult breeders in the study site in a particular year- MCfden = numeric; mean-centred female density- MCbfden = numeric; mean-centred between-female density- MCwfden = numeric; mean-centred within-female density- mean.t.window = numeric; mean temperature during the identified pre-breeding window (03 May to 26 May; see Methods: Climate sensitivity analysis in reference)- MCtemp = numeric; mean-centred temperature during the optimal pre-breeding window- MCbtemp = numeric; mean-centred between-female temperature during the optimal pre-breeding window- MCwtemp = numeric; mean-centred within-female temperature during the optimal pre-breeding window- female.age = integer; age (in years) of a given female in a given year- MCage = numeric; mean-centred female age- MCbage = numeric; mean-centred between-female age- MCwage = numeric; mean-centred within-female age- mean_temp_c = numeric; mean daily temperature in °C- meanLD = numeric; mean lay date (in days of the year) across all first nest attempts in a given year- sdLD = numeric; standard deviation in lay date (in days of the year) across all first nest attempts in a given year- seLD = numeric; standard error n lay date (in days of the year) across all first nest attempts in a given year- meanTEMP = numeric; mean temperature (in °C) during the breeding period in a given year- records = integer; number of first nest attempts from each year included in the analysis- total.nestling.precip = numeric; total rainfall (in mm) during the nestling period (01 June to 31 July) in a given year- total.breeding.precip = numeric; total rainfall (in mm) during the breeding period (15 April to 31 July) in a given year- density.total = integer; total density of adult breeders on the study site in a given year- total.fledglings = integer; total number of offspring fledged by all breeders in the study site on a given year- cohort.fecundity = numeric; average number of offspring per breeder in a given yearCodecode for Burant et al. - SAVS lay date plasticity analysis.RThe R script provided includes all the code required to import the data and perform the statistical analyses presented in the manuscript. These include:- t-tests investigating the effects of natal conditions (rain.categorical) on female age, nest attempts, and reproductive success- linear models of changes in temperature, precipitation, reproductive success, and population density over time, and lay dates in response to female age, density, etc.- a climate sensing analysis to identify the optimal pre-breeding window on Kent Island- mixed effects models investigating how lay dates respond to changes in within- and between-female age, density, and temperaturesee readme.rtf for a list of datasets and variables.

  7. d

    Percentage Differences Streamflow

    • search.dataone.org
    • data.usgs.gov
    • +1more
    Updated Nov 9, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burl Goree (2017). Percentage Differences Streamflow [Dataset]. https://search.dataone.org/view/6fef48e6-bf85-4c4f-96c5-f616f3009213
    Explore at:
    Dataset updated
    Nov 9, 2017
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Burl Goree
    Area covered
    Variables measured
    WaterYr, site_no, Percent.Diff, Current.Peak.Q, Original.Peak.Q
    Description

    A comma separated values (csv) file that is a snapshot of percent difference between November 19, 2008 and November 14, 2016 peak streamflow. The file lists station identification, water year, original (2008) peak Q, current (2016) peak Q and percent difference calculated per water year. The percent difference was calculated as the absolute value of [(current peak Q - original peak Q)/(original peak Q) x 100], where current peak Q is the 2016 peak and the original peak Q is the 2008 peak. When an original peak Q value is 0, the resultant percent difference calculation is undefined because of division by 0. In these cases, the percent difference field is populated with NA. Those entries are included in the data file so that users can make their own comparisons between the 2008 and 2016 peaks for those cases where the original peak value was 0.

  8. f

    Data_Sheet_1_Simultaneous Interrogation of Cancer Omics to Identify Subtypes...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Mar 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han, GuoQiang; Chen, Jiazhou; Xu, Aodan; Peng, Hong; Cai, Hongmin (2019). Data_Sheet_1_Simultaneous Interrogation of Cancer Omics to Identify Subtypes With Significant Clinical Differences.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000188251
    Explore at:
    Dataset updated
    Mar 28, 2019
    Authors
    Han, GuoQiang; Chen, Jiazhou; Xu, Aodan; Peng, Hong; Cai, Hongmin
    Description

    Recent advances in high-throughput sequencing have accelerated the accumulation of omics data on the same tumor tissue from multiple sources. Intensive study of multi-omics integration on tumor samples can stimulate progress in precision medicine and is promising in detecting potential biomarkers. However, current methods are restricted owing to highly unbalanced dimensions of omics data or difficulty in assigning weights between different data sources. Therefore, the appropriate approximation and constraints of integrated targets remain a major challenge. In this paper, we proposed an omics data integration method, named high-order path elucidated similarity (HOPES). HOPES fuses the similarities derived from various omics data sources to solve the dimensional discrepancy, and progressively elucidate the similarities from each type of omics data into an integrated similarity with various high-order connected paths. Through a series of incremental constraints for commonality, HOPES can take both specificity of single data and consistency between different data types into consideration. The fused similarity matrix gives global insight into patients' correlation and efficiently distinguishes subgroups. We tested the performance of HOPES on both a simulated dataset and several empirical tumor datasets. The test datasets contain three omics types including gene expression, DNA methylation, and microRNA data for five different TCGA cancer projects. Our method was shown to achieve superior accuracy and high robustness compared with several benchmark methods on simulated data. Further experiments on five cancer datasets demonstrated that HOPES achieved superior performances in cancer classification. The stratified subgroups were shown to have statistically significant differences in survival. We further located and identified the key genes, methylation sites, and microRNAs within each subgroup. They were shown to achieve high potential prognostic value and were enriched in many cancer-related biological processes or pathways.

  9. d

    Data from: A simple method for statistical analysis of intensity differences...

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Sep 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). A simple method for statistical analysis of intensity differences in microarray-derived gene expression data [Dataset]. https://catalog.data.gov/dataset/a-simple-method-for-statistical-analysis-of-intensity-differences-in-microarray-derived-ge
    Explore at:
    Dataset updated
    Sep 7, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.

  10. Gender Pay Gap Dataset

    • kaggle.com
    zip
    Updated Feb 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fedesoriano (2022). Gender Pay Gap Dataset [Dataset]. https://www.kaggle.com/datasets/fedesoriano/gender-pay-gap-dataset
    Explore at:
    zip(61650632 bytes)Available download formats
    Dataset updated
    Feb 2, 2022
    Authors
    fedesoriano
    Description

    Similar Datasets

    • Company Bankruptcy Prediction: LINK
    • The Boston House-Price Data: LINK
    • California Housing Prices Data (5 new features!): LINK
    • Spanish Wine Quality Dataset: LINK

    Context

    The gender pay gap or gender wage gap is the average difference between the remuneration for men and women who are working. Women are generally considered to be paid less than men. There are two distinct numbers regarding the pay gap: non-adjusted versus adjusted pay gap. The latter typically takes into account differences in hours worked, occupations were chosen, education, and job experience. In the United States, for example, the non-adjusted average female's annual salary is 79% of the average male salary, compared to 95% for the adjusted average salary.

    The reasons link to legal, social, and economic factors, and extend beyond "equal pay for equal work".

    The gender pay gap can be a problem from a public policy perspective because it reduces economic output and means that women are more likely to be dependent upon welfare payments, especially in old age.

    This dataset aims to replicate the data used in the famous paper "The Gender Wage Gap: Extent, Trends, and Explanations", which provides new empirical evidence on the extent of and trends in the gender wage gap, which declined considerably during the 1980–2010 period.

    Citation

    fedesoriano. (January 2022). Gender Pay Gap Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/fedesoriano/gender-pay-gap-dataset.

    Content

    There are 2 files in this dataset: a) the Panel Study of Income Dynamics (PSID) microdata over the 1980-2010 period, and b) the Current Population Survey (CPS) to provide some additional US national data on the gender pay gap.

    PSID variables:

    NOTES: THE VARIABLES WITH fz ADDED TO THEIR NAME REFER TO EXPERIENCE WHERE WE HAVE FILLED IN SOME ZEROS IN THE MISSING PSID YEARS WITH DATA FROM THE RESPONDENTS’ ANSWERS TO QUESTIONS ABOUT JOBS WORKED ON DURING THESE MISSING YEARS. THE fz variables WERE USED IN THE REGRESSION ANALYSES THE VARIABLES WITH A predict PREFIX REFER TO THE COMPUTATION OF ACTUAL EXPERIENCE ACCUMULATED DURING THE YEARS IN WHICH THE PSID DID NOT SURVEY THE RESPONDENTS. THERE ARE MORE PREDICTED EXPERIENCE LEVELS THAT ARE NEEDED TO IMPUTE EXPERIENCE IN THE MISSING YEARS IN SOME CASES. NOTE THAT THE VARIABLES yrsexpf, yrsexpfsz, etc., INCLUDE THESE COMPUTATIONS, SO THAT IF YOU WANT TO USE FULL TIME OR PART TIME EXPERIENCE, YOU DON’T NEED TO ADD THESE PREDICT VARIABLES IN. THEY ARE INCLUDED IN THE DATA SET TO ILLUSTRATE THE RESULTS OF THE COMPUTATION PROCESS. THE VARIABLES WITH AN orig PREFIX ARE THE ORIGINAL PSID VARIABLES. THESE HAVE BEEN PROCESSED AND IN SOME CASES RENAMED FOR CONVENIENCE. THE hd SUFFIX MEANS THAT THE VARIABLE REFERS TO THE HEAD OF THE FAMILY, AND THE wf SUFFIX MEANS THAT IT REFERS TO THE WIFE OR FEMALE COHABITOR IF THERE IS ONE. AS SHOWN IN THE ACCOMPANYING REGRESSION PROGRAM, THESE orig VARIABLES AREN’T USED DIRECTLY IN THE REGRESSIONS. THERE ARE MORE OF THE ORIGINAL PSID VARIABLES, WHICH WERE USED TO CONSTRUCT THE VARIABLES USED IN THE REGRESSIONS. HD MEANS HEAD AND WF MEANS WIFE OR FEMALE COHABITOR.

    1. intnum68: 1968 INTERVIEW NUMBER
    2. pernum68: PERSON NUMBER 68
    3. wave: Current Wave of the PSID
    4. sex: gender SEX OF INDIVIDUAL (1=male, 2=female)
    5. intnum: Wave-specific Interview Number
    6. farminc: Farm Income
    7. region: regLab Region of Current Interview
    8. famwgt: this is the PSID’s family weight, which is used in all analyses
    9. relhead: ER34103L this is the relation to the head of household (10=head; 20=legally married wife; 22=cohabiting partner)
    10. age: Age
    11. employed: ER34116L Whether or not employed or on temp leave (everyone gets a 1 for this variable, since our wage analyses use only the currently employed)
    12. sch: schLbl Highest Year of Schooling
    13. annhrs: Annual Hours Worked
    14. annlabinc: Annual Labor Income
    15. occ: 3 Digit Occupation 2000 codes
    16. ind: 3 Digit Industry 2000 codes
    17. white: White, nonhispanic dummy variable
    18. black: Black, nonhispanic dummy variable
    19. hisp: Hispanic dummy variable
    20. othrace: Other Race dummy variable
    21. degree: degreeLbl Agent's Degree Status (0=no college degree; 1=bachelor’s without advanced degree; 2=advanced degree)
    22. degupd: degreeLbl Agent's Degree Status (Updated with 2009 values)
    23. schupd: schLbl Schooling (updated years of schooling)
    24. annwks: Annual Weeks Worked
    25. unjob: unJobLbl Union Coverage dummy variable
    26. usualhrwk: Usual Hrs Worked Per Week
    27. labincbus: Labor Income from...
  11. f

    Table_1_Comparison between two cancer registry quality check systems:...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barigelletti, Giulio; Fusco, Mario; Group, The Collaborative Working; Contiero, Paolo; Fabiano, Sabrina; Bidoli, Ettore; Tittarelli, Andrea; Mazzucco, Walter; Stracci, Fabrizio; Vicentini, Massimo; Perotti, Viviana; Tagliabue, Giovanna; Pesce, Maria Teresa (2023). Table_1_Comparison between two cancer registry quality check systems: functional features and differences in an Italian network of cancer registries dataset.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001040159
    Explore at:
    Dataset updated
    May 25, 2023
    Authors
    Barigelletti, Giulio; Fusco, Mario; Group, The Collaborative Working; Contiero, Paolo; Fabiano, Sabrina; Bidoli, Ettore; Tittarelli, Andrea; Mazzucco, Walter; Stracci, Fabrizio; Vicentini, Massimo; Perotti, Viviana; Tagliabue, Giovanna; Pesce, Maria Teresa
    Description

    PurposeThe aim of this study was to compare the functional characteristics of two computer-based systems for quality control of cancer registry data through analysis of their output differences.MethodsThe study used cancer incidence data from 22 of the 49 registries of the Italian Network of Cancer Registries registered between 1986 and 2017. Two different data checking systems developed by the WHO International Agency for Research on Cancer (IARC) and the Joint Research Center (JRC) with the European Network of Cancer Registries (ENCR) and routinely used by registrars were used to check the quality of the data. The outputs generated by the two systems on the same dataset of each registry were analyzed and compared.ResultsThe study included a total of 1,305,689 cancer cases. The overall quality of the dataset was high, with 86% (81.7-94.1) microscopically verified cases and only 1.3% (0.03-3.06) cases with a diagnosis by death certificate only. The two check systems identified a low percentage of errors (JRC-ENCR 0.17% and IARC 0.003%) and about the same proportion of warnings (JRC-ENCR 2.79% and IARC 2.42%) in the dataset. Forty-two cases (2% of errors) and 7067 cases (11.5% of warnings) were identified by both systems in equivalent categories. 11.7% of warnings related to TNM staging were identified by the JRC-ENCR system only. The IARC system identified mainly incorrect combination of tumor grade and morphology (72.5% of warnings).ConclusionBoth systems apply checks on a common set of variables, but some variables are checked by only one of the systems (for example, checks on patient follow-up and tumor stage at diagnosis are included by the JRC-ENCR system only). Most errors and warnings were categorized differently by the two systems, but usually described the same issues, with warnings related to “morphology” (JRC-ENCR) and “histology” (IARC) being the most frequent. It is important to find the right balance between the need to maintain high standards of data quality and the workability of such systems in the daily routine of the cancer registry.

  12. Data from: Summer Steelhead Distribution [ds341]

    • data.ca.gov
    • data.cnra.ca.gov
    • +5more
    Updated Oct 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2023). Summer Steelhead Distribution [ds341] [Dataset]. https://data.ca.gov/dataset/summer-steelhead-distribution-ds3411
    Explore at:
    geojson, html, kml, csv, zip, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summer Steelhead Distribution October 2009 Version This dataset depicts observation-based stream-level geographic distribution of anadromous summer-run steelhead trout, Oncorhynchus mykiss irideus (O. mykiss), in California. It was developed for the express purpose of assisting with steelhead recovery planning efforts. The distributions reported in this dataset were derived from a subset of the data contained in the Aquatic Species Observation Database (ASOD), a Microsoft Access multi-species observation data capture application. ASOD is an ongoing project designed to capture as complete a set of statewide inland aquatic vertebrate species observation information as possible. Please note: A separate distribution is available for winter-run steelhead. Contact information is the same as for the above. ASOD Observation data were used to develop a network of stream segments. These lines are developed by "tracing down" from each observation to the sea using the flow properties of USGS National Hydrography Dataset (NHD) High Resolution hydrography. Lastly these lines, representing stream segments, were assigned a value of either Anad Present (Anadromous present). The end result (i.e., this layer) consists of a set of lines representing the distribution of steelhead based on observations in the Aquatic Species Observation Database. This dataset represents stream reaches that are known or believed to be used by steelhead based on steelhead observations. Thus, it contains only positive steelhead occurrences. The absence of distribution on a stream does not necessarily indicate that steelhead do not utilize that stream. Additionally, steelhead may not be found in all streams or reaches each year. This is due to natural variations in run size, water conditions, and other environmental factors. The information in this data set should be used as an indicator of steelhead presence/suspected presence at the time of the observation as indicated by the 'Late_Yr' (Latest Year) field attribute. The line features in the dataset may not represent the maximum extent of steelhead on a stream; rather it is important to note that this distribution most likely underestimates the actual distribution of steelhead. This distribution is based on observations found in the ASOD database. The individual observations may not have occurred at the upper extent of anadromous occupation. In addition, no attempt was made to capture every observation of O. mykiss and so it should not be assumed that this dataset is complete for each stream. The distribution dataset was built solely from the ASOD observational data. No additional data (habitat mapping, barriers data, gradient modeling, etc.) were utilized to either add to or validate the data. It is very possible that an anadromous observation in this dataset has been recorded above (upstream of) a barrier as identified in the Passage Assessment Database (PAD). In the near future, we hope to perform a comparative analysis between this dataset and the PAD to identify and resolve all such discrepancies. Such an analysis will add rigor to and help validate both datasets. This dataset has recently undergone a review. Data source contributors as well as CDFG fisheries biologists have been provided the opportunity to review and suggest edits or additions during a recent review. Data contributors were notified and invited to review and comment on the handling of the information that they provided. The distribution was then posted to an intranet mapping application and CDFG biologists were provided an opportunity to review and comment on the dataset. During this review, biologists were also encouraged to add new observation data. This resulting final distribution contains their suggestions and additions. Please refer to "Use Constraints" section below.

  13. Z

    Synthesized anthropometric data for the German working-age population

    • data.niaid.nih.gov
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ackermann, Alexander; Bonin, Dominik; Jaitner, Thomas; Peters, Markus; Radke, Dörte; Wischniewski, Sascha (2023). Synthesized anthropometric data for the German working-age population [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8042776
    Explore at:
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    Institute for Community Medicine - SHIP/KEF, University Medicine Greifswald
    Federal Institute for Occupational Safety and Health (BAuA)
    Institute for Sport and Sport Science, TU Dortmund University
    Authors
    Ackermann, Alexander; Bonin, Dominik; Jaitner, Thomas; Peters, Markus; Radke, Dörte; Wischniewski, Sascha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The anthropometric datasets presented here are virtual datasets. The unweighted virtual dataset was generated using a synthesis and subsequent validation algorithm (Ackermann et al., 2023). The underlying original dataset used in the algorithm was collected within a regional epidemiological public health study in northeastern Germany (SHIP, see Völzke et al., 2022). Important details regarding the collection of the anthropometric dataset within SHIP (e.g. sampling strategy, measurement methodology & quality assurance process) are discussed extensively in the study by Bonin et al. (2022). To approximate nationally representative values for the German working-age population, the virtual dataset was weighted with reference data from the first survey wave of the Study on health of adults in Germany (DEGS1, see Scheidt-Nave et al., 2012). Two different algorithms were used for the weighting procedure: (1) iterative proportional fitting (IPF), which is described in more detail in the publication by Bonin et al. (2022), and (2) a nearest neighbor approach (1NN), which is presented in the study by Kumar and Parkinson (2018). Weighting coefficients were calculated for both algorithms and it is left to the practitioner which coefficients are used in practice. Therefore, the weighted virtual dataset has two additional columns containing the calculated weighting coefficients with IPF ("WeightCoef_IPF") or 1NN ("WeightCoef_1NN"). Unfortunately, due to the sparse data basis at the distribution edges of SHIP compared to DEGS1, values underneath the 5th and above the 95th percentile should be considered with caution. In addition, the following characteristics describe the weighted and unweighted virtual datasets: According to ISO 15535, values for "BMI" are in [kg/m2], values for "Body mass" are in [kg], and values for all other measures are in [mm]. Anthropometric measures correspond to measures defined in ISO 7250-1. Offset values were calculated for seven anthropometric measures because there were systematic differences in the measurement methodology between SHIP and ISO 7250-1 regarding the definition of two bony landmarks: the acromion and the olecranon. Since these seven measures rely on one of these bony landmarks, and it was not possible to modify the SHIP methodology regarding landmark definitions, offsets had to be calculated to obtain ISO-compliant values. In the presented datasets, two columns exist for these seven measures. One column contains the measured values with the landmarking definitions from SHIP, and the other column (marked with the suffix "_offs") contains the calculated ISO-compliant values (for more information concerning the offset values see Bonin et al., 2022). The sample size is N = 5000 for the male and female subsets. The original SHIP dataset has a sample size of N = 1152 (women) and N = 1161 (men). Due to this discrepancy between the original SHIP dataset and the virtual datasets, users may get a false sense of comfort when using the virtual data, which should be mentioned at this point. In order to get the best possible representation of the original dataset, a virtual sample size of N = 5000 is advantageous and has been confirmed in pre-tests with varying sample sizes, but it must be kept in mind that the statistical properties of the virtual data are based on an original dataset with a much smaller sample size.

  14. Social Reward and Nonsocial Reward Processing Across the Adult Lifespan: An...

    • openneuro.org
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David V. Smith; Cooper J. Sharp; Abraham Dachs; James Wyngaarden; Daniel Sazhin; Jen Yang; Melanie Kos; Tia Tropea; Ishika Kohli; John A. Clithero; Ingrid Olson; Tania Giovannetti; Dominic Fareri; Johanna M. Jarcho (2024). Social Reward and Nonsocial Reward Processing Across the Adult Lifespan: An Interim Multi-echo fMRI and Diffusion Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds005123.v1.1.1
    Explore at:
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    David V. Smith; Cooper J. Sharp; Abraham Dachs; James Wyngaarden; Daniel Sazhin; Jen Yang; Melanie Kos; Tia Tropea; Ishika Kohli; John A. Clithero; Ingrid Olson; Tania Giovannetti; Dominic Fareri; Johanna M. Jarcho
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary

    This is a preliminary release of a dataset supported by the National Institutes of Aging and National Insitutes of Health. The full dataset is described in a submission to Data in Brief.

    Abstract

    Social relationships change across the lifespan as social networks narrow and motivational priorities shift. These changes may affect, or reflect, differences in how older adults make decisions related to processing social and non-social rewards. While we have shown initial evidence that older adults have a blunted response to some features of social reward, further work in larger samples is needed to probe the extent to which age-related differences translate to real world consequences, such as financial exploitation. To address this gap, we are conducting a 5-year study funded by the National Institute on Aging (NIH R01-AG067011). Over the course of the funding period (2021-2026), this study seeks to: 1) characterize neural responses to social rewards across adulthood; 2) relate those responses to risk for financial exploitation and sociodemographic factors tied to risk; and 3) examine changes in risk for financial exploitation over time in healthy and vulnerable groups of older adults. This paper describes the preliminary release of data for the larger study. Adults (N=114; 40 male / 70 female / 4 other or non-binary; 21-80 years of age M = 42.78, SD = 17.13) were recruited from the community to undergo multi-echo fMRI while completing tasks that measure brain function during social reward and decision-making. Tasks probe neural response to social reward (e.g., peer vs. monetary feedback) and social context and closeness (e.g., sharing a monetary reward with a friend compared to a stranger). Neural response to social decision-making is probed via economic trust and ultimatum games. Functional data, are complimented by a T1 weighted anatomical scan, and diffusion-weighted imaging (DWI) to enable tractography. This dataset has extensive potential for re-use, including leveraging multimodal neuroimaging data, within subject measures of fMRI data from different tasks – data features that are rarely see in an adult lifespan dataset.

    Expanded Task Names

    1. doors and socialdoors: a task in which participants received well-matched social and monetary rewards and punishment;
    2. ugdg: a strategic reward-based decision-making task with Ultimatum and Dictator Game conditions
    3. trust: a task where participants choose an amount to invest in their partner (friend, stranger, or computer) and see wether or not that partner shared the tripled amount back
    4. sharedreward: a task where participants shared rewards or losses with peers, strangers, or non-human partners

    Additional Usage Notes

    We note that participants 10584, 10951, and 11005 are missing dwi. This is due to chiller malfunctions during the sequence that halted data collection. We also note that not all participants have two runs of each task. This was due to time constraints during the scan visits.

    Code related to this dataset can be found on GitHub (https://github.com/DVS-Lab/SRPAL-DataInBrief/code/).

    Original sourcedata for behavioral data is included in the sourcedata folder. Due to privacy restrictions, we cannot release original sourcedata for the imaging data (i.e., DICOM files).

  15. o

    Monthly time series of spatially enhanced relative humidity for Europe at...

    • data.opendatascience.eu
    • data.mundialis.de
    • +1more
    Updated Dec 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Monthly time series of spatially enhanced relative humidity for Europe at 1000 m resolution (2000 - 2022) derived from ERA5-Land data [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?keyword=TBE
    Explore at:
    Dataset updated
    Dec 16, 2023
    Area covered
    Europe
    Description

    Overview: ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. ERA5-Land has been produced by replaying the land component of the ECMWF ERA5 climate reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. Reanalysis produces data that goes several decades back in time, providing an accurate description of the climate of the past. Processing steps: The original hourly ERA5-Land air temperature 2 m above ground and dewpoint temperature 2 m data has been spatially enhanced from 0.1 degree to 30 arc seconds (approx. 1000 m) spatial resolution by image fusion with CHELSA data (V1.2) (https://chelsa-climate.org/). For each day we used the corresponding monthly long-term average of CHELSA. The aim was to use the fine spatial detail of CHELSA and at the same time preserve the general regional pattern and fine temporal detail of ERA5-Land. The steps included aggregation and enhancement, specifically: 1. spatially aggregate CHELSA to the resolution of ERA5-Land 2. calculate difference of ERA5-Land - aggregated CHELSA 3. interpolate differences with a Gaussian filter to 30 arc seconds. 4. add the interpolated differences to CHELSA Subsequently, the temperature time series have been aggregated on a daily basis. From these, daily relative humidity has been calculated for the time period 01/2000 - 12/2023. Relative humidity (rh2m) has been calculated from air temperature 2 m above ground (Ta) and dewpoint temperature 2 m above ground (Td) using the formula for saturated water pressure from Wright (1997): maximum water pressure = 611.21 * exp(17.502 * Ta / (240.97 + Ta)) actual water pressure = 611.21 * exp(17.502 * Td / (240.97 + Td)) relative humidity = actual water pressure / maximum water pressure The resulting relative humidity has been aggregated to monthly averages. Resultant values have been converted to represent percent * 10, thus covering a theoretical range of [0, 1000]. The data have been reprojected to EU LAEA. File naming scheme (YYYY = year; MM = month): ERA5_land_rh2m_avg_monthly_YYYY_MM.tif Projection + EPSG code: EU LAEA (EPSG: 3035) Spatial extent: north: 6874000 south: -485000 west: 869000 east: 8712000 Spatial resolution: 1000 m Temporal resolution: Monthly Pixel values: Percent * 10 (scaled to Integer; example: value 738 = 73.8 %) Software used: GDAL 3.2.2 and GRASS GIS 8.0.0/8.3.2 Original ERA5-Land dataset license: https://apps.ecmwf.int/datasets/licences/copernicus/ CHELSA climatologies (V1.2): Data used: Karger D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E, Linder, H.P., Kessler, M. (2018): Data from: Climatologies at high resolution for the earth's land surface areas. Dryad digital repository. http://dx.doi.org/doi:10.5061/dryad.kd1d4 Original peer-reviewed publication: Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, P., Kessler, M. (2017): Climatologies at high resolution for the Earth land surface areas. Scientific Data. 4 170122. https://doi.org/10.1038/sdata.2017.122 Processed by: mundialis GmbH & Co. KG, Germany (https://www.mundialis.de/) Reference: Wright, J.M. (1997): Federal meteorological handbook no. 3 (FCM-H3-1997). Office of Federal Coordinator for Meteorological Services and Supporting Research. Washington, DC Data is also available in Latitude-Longitude/WGS84 (EPSG: 4326) projection: https://data.mundialis.de/geonetwork/srv/eng/catalog.search#/metadata/b9ce7dba-4130-428d-96f0-9089d8b9f4a5 Acknowledgements: This study was partially funded by EU grant 874850 MOOD. The contents of this publication are the sole responsibility of the authors and don't necessarily reflect the views of the European Commission.

  16. EDDEN

    • openneuro.org
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Pedro Manzano Patron; Steen Moeller; Jesper L.R. Andersson; Essa Yacoub; Stamatios N. Sotiropoulos (2023). EDDEN [Dataset]. http://doi.org/10.18112/openneuro.ds004666.v1.0.0
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Jose Pedro Manzano Patron; Steen Moeller; Jesper L.R. Andersson; Essa Yacoub; Stamatios N. Sotiropoulos
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    EDDEN stands for *E*valuation of *D*MRI *DEN*oising approaches. The data correspond to the publication: Manzano Patron, J.P., Moeller, S., Andersson, J.L.R., Yacoub, E., Sotiropoulos, S.N.. Denoising Diffusion MRI: Considerations and implications for analysis. doi: https://doi.org/10.1101/2023.07.24.550348. Please, cite it if you use this dataset.

    • Description of the dataset RAW Complex data (magnitude and phase) is acquired for a single subject at different SNR/resolution regimes, under ~/EDDEN/sub-01/ses-XXX/dwi/:

      • Dataset A (2mm)

        • This dataset represents a relatively medium-to-high SNR regime.
        • 6 repeats of a 2mm isotropic multi-shell dataset each implementing the UK Biobank protocol (Miller et al., 2016)
        • TR=3s, TE=92ms, MB=3, no in-plane acceleration, scan time ∼6 minutes per repeat.
        • For each repeat, 116 volumes were acquired: 105 volumes with AP phase encoding direction, such as 5 b = 0 s/mm2 volumes, and 100 diffusion encoding orientations, with 50 b = 1000 s/mm2 and 50 b = 2000 s/mm2 volumes; and 4 b = 0 s/mm2 volumes with reversed phase encoding direction (PA) for susceptibility induced distortion correction (Andersson and Skare, 2002).
        • NOTES: Only 1 PA set of volumes was acquired for all the runs.
      • Dataset B (1p5mm):

        • This is a low-to-medium SNR dataset, with relatively high resolution.
        • 5 repeats of a 1.5 mm isotropic multi-shell dataset, each implementing an HCP-like protocol in terms of q-space sampling (Sotiropoulos et al., 2013a).
        • TR=3.23 s, TE=89.2 ms, MB=4 no in-plane acceleration, scan time ∼16 minutes per repeat.
        • For each repeat, 300 volumes were acquired: 297 volumes with AP phase encoding direction, such as 27 b = 0 s/mm2 volumes, and 270 diffusion encoding orientations, with 90 b = 1000 s/mm2, 90 b = 2000 s/mm2, and 90 b = 3000 s/mm2 volumes; and 3 b = 0 s/mm2 volumes with PA phase encoding for susceptibility-induced distortion correction.
      • Dataset C (0p9mm):

        • This is a very low SNR dataset to represent extremely noisy data that without denoising are expected to be borderline unusable (particularly for the higher b).
        • 4 repeats of an ultra-high-resolution multi-shell dataset with 0.9mm isotropic resolution.
        • TR=6.569 s, TE=91 ms, MB=3, in-plane GRAPPA=2, scan time ∼22 minutes per repeat.
        • For each repeat, 202 volumes were acquired with orientations as in (Harms et al., 2018): 199 volumes with AP phase encoding direction, such as 14 b = 0 s/mm2 volumes, and 185 diffusion encoding orientations, with 93 b = 1000 s/mm2, and 92 b = 2000 s/mm2 volumes; and 3 b = 0 s/mm2 volumes with PA phase encoding for susceptibility-induced distortion correction.
        • NOTES: The phase of the PAs is not available, and the same PA is used for runs 3 and 4.

    Each dataset contains their own T1w-MPRAGE under ~/EDDEN/sub-01/ses-XXX/anat/. Each data set was acquired on a different day, to minimise fatigue, but all repeats within a dataset were acquired in the same session. All acquisitions were obtained parallel to the anterior and posterior commissure line, covering the entire cerebrum.

    DERIVATIVES Here are the different denoised version of the raw data for the different datasets, the pre-processed data for the raw, denoised and averages, and the FA, MD and V1 outputs from the DTI model fitting (see *Data pre-processin section below). - Denoised data: - NLM (NLM), for Non-Local Means denoising applied to magnitude raw data. - MPPCA (|MPPCA|), for Marchenko-Pastur PCA denoising applied to magnitude raw data. - MPPCA_complex (MPPCA*), for Marchenko-Pastur PCA denoising applied to complex raw data. - NORDIC (NORDIC), for NORDIC applied to complex raw data. - AVG_mag (|AVG|), for the average of the multiple repeats in magnitude. - AVG_complex (AVG*), for the average in the complex space of the multiple repeats. - Masks: Under ~/EDDEN/derivatives/ses-XXX/masks we can find different masks for each dataset: - GM_mask: Gray Matter mask. - WM_mask: White Matter mask. - CC_mask: Corpus Callosum Matter mask. - CS_mask: Centrum Semiovale mask. - ventricles_mask: CSF ventricles mask. - nodif_brain_mask: Eroded brain mask.

    • Data pre-processing Both magnitude and phase data were retained for each acquisition to allow evaluations of denoising in both magnitude and complex domains. In order to allow distortion correction and processing for complex data and avoid phase incoherence artifacts, the raw complex-valued diffusion data were rotated to the real axis using the phase information. A spatially varying phase-field was estimated and complex vectors were multiplied with the conjugate of the phase. The phase-field was estimated uniquely for each slice and volume by firstly removing the phase variations from k-space sampling and coil sensitivity combination, and secondly by removing an estimate of a smooth residual phase-field. The smooth residual phase-field was estimated using a low-pass filter with a narrowed tapered cosine filter (a Tukey filter with an FWHM of 58%). Hence, the final signal was rotated approximately along the real axis, subject to the smoothness constraints.

    Having the magnitude and complex data for each dataset, denoising was applied using different approaches prior to any pre-processing to minimise potential changes in statistical properties of the raw data due to interpolations (Veraart et al., 2016b). For denoising, we used the following four algorithms:

    - **Denoising in the magnitude domain**: i) The Non-Local Means (**NLM**) (Buades et al., 2005) was applied as an exemplar of a simple non-linear filtering method adapted from traditional signal pre-processing. We used the default implementation in DIPY (Garyfallidis et al., 2014), where each dMRI volume is denoised independently. ii) The Marchenko-Pastur PCA (MPPCA) (denoted as **|MPPCA|** throughout the text) (Cordero-Grande et al., 2019; Veraart et al., 2016b), reflecting a commonly used approach that performs PCA over image patches and uses the MP theorem to identify noise components from the eigenspectrum. We used the default MrTrix3 implementation (Tournier et al., 2019).
    
    - **Denoising in the complex domain**: i) MPPCA applied to complex data (rotated along the real axis), denoted as **MPPCA***. We applied the MrTrix3 implementation of the magnitude MPPCA to the complex data rotated to the real axis (we found that this approach was more stable in terms of handling phase images and achieved better denoising, compared to the MrTrix3 complex MPPCA implementation). ii) The **NORDIC** algorithm (Moeller et al., 2021a), which also relies on the MP theorem, but performs variance spatial normalisation prior to noise component identification and filtering, to ensure noise stationarity assumptions are fulfilled.
    

    All data, both raw and their four denoised versions, underwent the same pre-processing steps for distortion and motion correction (Sotiropoulos et al., 2013b) using an in-house pipeline (Mohammadi-Nejad et al., 2019). To avoid confounds from potential misalignment in the distortion-corrected diffusion native space obtained from each approach, we chose to compute a single susceptibility-induced off-resonance fieldmap using the raw data for each of the Datasets A, B and C; and then use the corresponding fieldmap for all denoising approaches in each dataset so that the reference native space stays the same for each of A, B and C. Note that differences between fieldmaps before and after denoising are small anyway, as the relatively high SNR b = 0 s/mm2 images are used to estimate them. But these small differences can cause noticeable misalignments between methods and confounds when attempting quantitative comparisons, which we avoid here using our approach. Hence, for each of the Datasets A, B and C, the raw blip-reversed b = 0 s/mm2 were used in FSL’s topup to generate a fieldmap (Andersson and Skare, 2002). This was then used into individual runs of FSL’s eddy for each approach (Andersson and Sotiropoulos, 2016) that applied the common fieldmap and performed corrections for eddy current and subject motion in a single interpolation step. FSL’s eddyqc (Bastiani et al.,2019) was used to generate quality control (QC) metrics, including SNR and angular CNR for each b value. The same T1w image was used within each dataset. A linear transformation estimated using with boundary-based registration (Greve and Fischl, 2009) was obtained from the corrected native diffusion space to the T1w space. The T1w image was skull-stripped and non-linearly registered to the MNI standard space allowing further analysis. Masks of white and grey matter were obtained from the T1w image using FSL’s FAST (Jenkinson et al., 2012) and they were aligned to diffusion space.

  17. D

    Replication Data for: What the MIPVU protocol doesn’t tell you (even though...

    • dataverse.azure.uit.no
    • dataverse.no
    • +1more
    txt, type/x-r-syntax
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susan Nacey; Susan Nacey; Tina Krennmayr; Aletta G. Dorst; Aletta G. Dorst; W. Gudrun Reijnierse; W. Gudrun Reijnierse; Tina Krennmayr (2023). Replication Data for: What the MIPVU protocol doesn’t tell you (even though it really does) [Dataset]. http://doi.org/10.18710/F04UW5
    Explore at:
    txt(4687), type/x-r-syntax(8474), txt(160256), type/x-r-syntax(8464), txt(160856)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Susan Nacey; Susan Nacey; Tina Krennmayr; Aletta G. Dorst; Aletta G. Dorst; W. Gudrun Reijnierse; W. Gudrun Reijnierse; Tina Krennmayr
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The two datasets provided here were used to provide inter-rater reliability statistics for the application of a metaphor identification procedure to texts written in English. Three experienced metaphor researchers applied the Metaphor Identification Procedure Vrije Universiteit (MIPVU) to approximately 1500 words of text from two English-language newspaper articles. The dataset Eng1 contains each researcher’s independent analysis of the lexical demarcation and metaphorical status of each word in the sample. The dataset Eng2 contains a second analysis of the same texts by the same three researchers, carried out after a comparison of our responses in Eng 1 and a troubleshooting session where we discussed our differences. The accompanying R-code was used to produce the three-way and pairwise inter-rater reliability data reported in Section 3.2 of the chapter: How do I determine what comprises a lexical unit? The headings in both datasets are identical, although the order of the columns differs in the two files. In both datasets, each line corresponds to one orthographic word from the newspaper texts. Chapter Abstract: The first part of this chapter discusses various ‘nitty-gritty’ practical aspects about the original MIPVU intended for the English language. Our focus in these first three sections is on common pitfalls for novice MIPVU users that we have encountered when teaching the procedure. First, we discuss how to determine what comprises a lexical unit (section 3.2). We then move on to how to determine a more basic meaning of a lexical unit (section 3.3), and subsequently discuss how to compare and contrast contextual and basic senses (section 3.4). We illustrate our points with actual examples taken from some of our teaching sessions, as well as with our own study into inter-rater reliability, conducted for the purposes of this new volume about MIPVU in multiple languages. Section 3.5 shifts to another topic that new MIPVU users ask about – namely, which practical tools they can use to annotate their data in an efficient way. Here we discuss some tools that we find useful, illustrating how we utilized them in our inter-rater reliability study. We close this part with section 3.6, a brief discussion about reliability testing. The second part of this chapter adopts more of a bird’s-eye view. Here we leave behind the more technical questions of how to operationalize MIPVU and its steps, and instead respond more directly to the question posed above: Do we really have to identify every metaphor in every bit of our data? We discuss possible approaches for research projects involving metaphor identification, by exploring a number of important questions that all researchers need to ask themselves (preferably before they embark on a major piece of research). Section 3.7 weighs some of the differences between quantitative and qualitative approaches in metaphor research projects, while section 3.8 talks about considerations when it comes to choosing which texts to investigate, as well as possible research areas where metaphor identification can play a useful role. We close this chapter in section 3.9 with a recap of our ‘take-away’ points – that is, a summary of the highlights from our entire discussion.

  18. a dataset of gene list in figure 3A

    • springernature.figshare.com
    xlsx
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shohei Ochi; Shyu Manabe; Takako Kikkawa; Sara Ebrahimiazar; Ryuichi Kimura; Kaichi Yoshizaki; Noriko Osumi (2024). a dataset of gene list in figure 3A [Dataset]. http://doi.org/10.6084/m9.figshare.25247449.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Shohei Ochi; Shyu Manabe; Takako Kikkawa; Sara Ebrahimiazar; Ryuichi Kimura; Kaichi Yoshizaki; Noriko Osumi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Z-score of the genes involved in GO “NEGATIVE_REGULATION_OF_NOTCH_SIGNALING_PATHWAY” calculated from FPKM values derived from both E11.5 and E14.5 RNA-seq data.

  19. f

    Table_1_Three levels of discrepancies in the records of trial sites in...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jul 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Borah, Rishima; Samanta, Anwesha Dhal; Saberwal, Gayatri (2024). Table_1_Three levels of discrepancies in the records of trial sites in India, registered with the European Union Clinical Trials Register.DOC [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001274768
    Explore at:
    Dataset updated
    Jul 5, 2024
    Authors
    Borah, Rishima; Samanta, Anwesha Dhal; Saberwal, Gayatri
    Area covered
    India
    Description

    IntroductionClinical trial registries serve a key role in tracking the trial enterprise. We are interested in the record of trials sites in India. In this study, we focused on the European Union Clinical Trial Registry (EUCTR). This registry is complex because a given study may have records from multiple countries in the EU, and therefore a given study ID may be represented by multiple records. We wished to determine what steps are required to identify the studies that list sites in India that are registered with EUCTR.MethodsWe used two methodologies. Methodology A involved downloading the EUCTR database and querying it. Methodology B used the search function on the registry website.ResultsDiscrepant information, on whether or not a given study listed a site in India, was identified at three levels: (i) the methodology of examining the database; (ii) the multiple records of a given study ID; and (iii) the multiple fields within a given record. In each of these situations, there was no basis to resolve the discrepancy, one way or another.DiscussionThis work contributes to methodologies for more accurate searches of trial registries. It also adds to the efforts of those seeking transparency in trial data.

  20. d

    Assessor - Assessed Values

    • catalog.data.gov
    • datacatalog.cookcountyil.gov
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datacatalog.cookcountyil.gov (2025). Assessor - Assessed Values [Dataset]. https://catalog.data.gov/dataset/assessor-historic-assessed-values
    Explore at:
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    datacatalog.cookcountyil.gov
    Description

    Land, building, and total assessed values for all Cook County parcels, from 1999 to present. The Assessor's Office uses these values for reporting, evaluating assessment performance over time, and research. When working with Parcel Index Numbers (PINs) make sure to zero-pad them to 14 digits. Some datasets may lose leading zeros for PINs when downloaded. This data is parcel-level. Each row contains the assessed values for a single PIN for a single year. Important notes:Assessed values are available in three stages: 1) mailed, these are the initial values estimated by the Assessor's Office and mailed to taxpayers. 2) certified, these are values after the Assessor's Office closes appeals. 3) Board of Review certified, these are values after the Board of Review closes appeals. The values in this data are assessed values, NOT market values. Assessed values must be adjusted by their level of assessment to arrive at market value. Note that levels of assessment have changed throughout the time period covered by this data set. This data set will be updated roughly contemporaneously (monthly) with the Assessor's website as values are mailed and certified. However, note that there may be small discrepancies between the Assessor's site and this data set, as each pulls from a slightly different system. If you find a discrepancy, please email the Data Department using the contact link below. This dataset contains data for the current tax year, which may not yet be complete or final. Assessed values for any given year are subject to change until review and certification of values by the Cook County Board of Review, though there are a few rare circumstances where values may change for the current or past years after that. Rowcount for a given year is final once the Assessor has certified the assessment roll all townships. Current property class codes, their levels of assessment, and descriptions can be found on the Assessor's website. Note that class codes details can change across time.For more information on the sourcing of attached data and the preparation of this dataset, see the Assessor's Standard Operating Procedures for Open Data on GitHub. Read about the Assessor's 2025 Open Data Refresh.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Liu (2025). DANI [Dataset]. https://huggingface.co/datasets/Renyang/DANI

DANI

Renyang/DANI

Explore at:
Dataset updated
Aug 2, 2025
Authors
Liu
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

DANI: Discrepancy Assessing for Natural and AI Images

Paper: D-Judge: How Far Are We? Evaluating the Discrepancies Between AI-synthesized Images and Natural Images through Multimodal Guidance Code: https://github.com/RenyangLiu/DJudge A Large-Scale Dataset for Visual Research on AI-Synthesized and Natural Images

  Overview

DANI (Discrepancy Assessing for Natural and AI Images) is a large-scale, multimodal dataset for benchmarking and broad visual research on both… See the full description on the dataset page: https://huggingface.co/datasets/Renyang/DANI.

Search
Clear search
Close search
Google apps
Main menu