6 datasets found
  1. Z

    Regression analysis in Galaxy with car purchase price prediction dataset

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaivan Kamali (2022). Regression analysis in Galaxy with car purchase price prediction dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4660496
    Explore at:
    Dataset updated
    Aug 4, 2022
    Dataset authored and provided by
    Kaivan Kamali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Source/Credit: Michael Grogan https://github.com/MGCodesandStats https://github.com/MGCodesandStats/datasets/blob/master/cars.csv

    Sample dataset for regression analysis. Given 5 attributes (age, gender, miles driven per day, debt, and income) predict how much someone will spend on purchasing a car. All 5 of the input attributes have been scaled to be in 0 to 1 range. Training set has 723 training examples. Test set has 242 test examples.

    This dataset will be used in an upcoming Galaxy Training Network tutorial (https://training.galaxyproject.org/training-material/topics/statistics/) on use of feedforward neural networks for regression analysis.

  2. f

    Data_Sheet_2_You’re Prettier When You Smile: Construction and Validation of...

    • frontiersin.figshare.com
    txt
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Algner; Timo Lorenz (2023). Data_Sheet_2_You’re Prettier When You Smile: Construction and Validation of a Questionnaire to Assess Microaggressions Against Women in the Workplace.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2022.809862.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Mona Algner; Timo Lorenz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gender microaggressions, especially its subtler forms microinsults and microinvalidations are by definition hard to discern. We aim to construct and validate a scale reflecting two facets of the microaggression taxonomy: microinsults and microinvalidations toward women in the workplace, the MIMI-16. Two studies were conducted (N1 = 500, N2 = 612). Using a genetic algorithm, a 16-item scale was developed and consequently validated via confirmatory factor analyses (CFA) in three separate validation samples. Correlational analyses with organizational outcome measures were performed. The MIMI-16 exhibits good model fit in all validation samples (CFI = 0.936–0.960, TLI = 0.926–0.954, RMSEA = 0.046–0.062, SRMR = 0.042–0.049). Multigroup-CFA suggested strict measurement invariance between all validation samples. Correlations were as expected and indicate internal and external validity. Scholars on gender microaggressions have mostly used qualitative research. With the newly developed MIMI-16 we provide a reliable and valid quantitative instrument to measure gender microaggressions in the workplace.

  3. d

    Replication Data for 'Gender (im)balance in the Russian cinema: on the...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leontyeva, Xenia (2024). Replication Data for 'Gender (im)balance in the Russian cinema: on the screen and behind the camera' [Dataset]. http://doi.org/10.7910/DVN/ISVTB4
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Leontyeva, Xenia
    Description

    There are two CSV datasets in this publication used initially in the master thesis in sociology of Xenia Leontyeva at HSE University Saint Petersburg, titled "Popularity Factors of Domestic Films: Gender Characteristics and State Support Measures" (2022), and lately for the article by Leontyeva, Xenia, Olessia Koltsova, and Deb Verhoeven, titled "Gender (Im)Balance in Russian Cinema: On the Screen and behind the Camera" (Accepted in January 2024 in The Journal of Cultural Analytics). The first dataset (N=1285) includes all Russian films produced between 2008 and 2019 and theatrically released between December 1, 2008, and December 31, 2019. Distribution statistics cover the territory of the CIS, of which the Russian Federation is the biggest market. Budget information is available for 644 films. The second dataset contains the Bechdel-Wallace test modified by Leontyeva markup for 243 films, 193 of which have budget information. There is also a supplement with a detailed description of all variables and R-code producing tables, plots, and models for the article. The database was collected by Xenia Leontyeva while working at Nevafilm Research (until 2018) and later. In terms of distribution data, it is based on sources such as the open base Russian Cinema Fund Analytics – RCFA (since 2015), the closed base comScore/Rentrak ("International Box Office Essential") serving major Hollywood studios (data from it has been used since 2008 to fill gaps in open databases), Bookers' Bulletin (since 2011), and Russian Film Business Today magazines (since 2004), as well as self-collected by Nevafilm Research employees from film distributors and producers; the rights to use and continue this dataset have been received from Nevafilm company. In terms of production data, the information was taken from the State register of film distribution certificates, Kinopoisk.ru, and from the films' credits.

  4. P

    EUCA dataset Dataset

    • paperswithcode.com
    Updated Feb 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weina Jin; Jianyu Fan; Diane Gromala; Philippe Pasquier; Ghassan Hamarneh (2021). EUCA dataset Dataset [Dataset]. https://paperswithcode.com/dataset/euca-dataset
    Explore at:
    Dataset updated
    Feb 3, 2021
    Authors
    Weina Jin; Jianyu Fan; Diane Gromala; Philippe Pasquier; Ghassan Hamarneh
    Description

    EUCA dataset description Associated Paper: EUCA: the End-User-Centered Explainable AI Framework

    Authors: Weina Jin, Jianyu Fan, Diane Gromala, Philippe Pasquier, Ghassan Hamarneh

    Introduction: EUCA dataset is for modelling personalized or interactive explainable AI. It contains 309 data points of 32 end-users' preferences on 12 forms of explanation (including feature-, example-, and rule-based explanations). The data were collected from a user study on 32 layperson participants in the Greater Vancouver city area in 2019-2020. In the user study, the participants (P01-P32) were presented with AI-assisted critical tasks on house price prediction, health status prediction, purchasing a self-driving car, and studying for a biological exam 1. Within each task and for its given explanation goal 2, the participants selected and rank the explanatory forms 3 that they saw the most suitable.

    1 EUCA_EndUserXAI_ExplanatoryFormRanking.csv

    Column description:

    Index - Participants' number Case - task-explanation goal combination accept to use AI? trust it? - Participants response to whether they will use AI given the task and explanation goal require explanation? - Participants response to the question whether they request an explanation for the AI 1st, 2nd, 3rd, ... - Explanatory form card selection and ranking cards fulfill requirement? - After the card selection, participants were asked whether the selected card combination fulfill their explainability requirement.

    2 EUCA_EndUserXAI_demography.csv

    It contains the participants demographics, including their age, gender, educational background, and their knowledge and attitudes toward AI.

    EUCA dataset zip file for download

    More Context for EUCA Dataset 1 Critical tasks There are four tasks. Task label and their corresponding task titles are: house - Selling your house car - Buying an autonomous driving vehicle health - Personal health decision bird - Learning bird species

    Please refer to EUCA quantatative data analysis report for the storyboard of the tasks and explanation goals presented in the user study.

    2 Explanation goal End-users may have different goals/purposes to check an explanation from AI. The EUCA dataset includes the following 11 explanation goals, with its [label] in the dataset, full name and description

    [trust] Calibrate trust: trust is a key to establish human-AI decision-making partnership. Since users can easily distrust or overtrust AI, it is important to calibrate the trust to reflect the capabilities of AI systems.

    [safe] Ensure safety: users need to ensure safety of the decision consequences.

    [bias] - Detect bias: users need to ensure the decision is impartial and unbiased.

    [unexpect] Resolve disagreement with AI: the AI prediction is unexpected and there are disagreements between users and AI.

    [expected] - Expected: the AI's prediction is expected and aligns with users' expectations.

    [differentiate] Differentiate similar instances: due to the consequences of wrong decisions, users sometimes need to discern similar instances or outcomes. For example, a doctor differentiates whether the diagnosis is a benign or malignant tumor.

    [learning] Learn: users need to gain knowledge, improve their problem-solving skills, and discover new knowledge

    [control] Improve: users seek causal factors to control and improve the predicted outcome.

    [communicate] Communicate with stakeholders: many critical decision-making processes involve multiple stakeholders, and users need to discuss the decision with them.

    [report] Generate reports: users need to utilize the explanations to perform particular tasks such as report production. For example, a radiologist generates a medical report on a patient's X-ray image.

    [multi] Trade-off multiple objectives: AI may be optimized on an incomplete objective while the users seek to fulfill multiple objectives in real-world applications. For example, a doctor needs to ensure a treatment plan is effective as well as has acceptable patient adherence. Ethical and legal requirements may also be included as objectives.

    3 Explanatory form The following 12 explanatory forms are end-user-friendly, i.e.: no technical knowledge is required for the end-user to interpret the explanation.

    Feature-Based Explanation Feature Attribution - fa
    Note: for tasks that has image as input data, the feature attribution is denoted by the following two cards: ir: important regions (a.k.a. heat map or saliency map) irc: important regions with their feature contribution percentage

    Feature Shape - fs

    Feature Interaction - fi

    Example-Based Explanation

    Similar Example - se Typical Example - te

    Counterfactual Example - ce

    Note: for contractual example, there were two visual variations used in the user study: cet: counterfactual example with transition from one example to the counterfactual one ceh: counterfactual example with the contrastive feature highlighted

    Rule-Based Explanation

    Rule - rt Decision Tree - dt

    Decision Flow - df

    Supplementary Information

    Input Output Performance Dataset - prior (output prediction with prior distribution of each class in the training set)

    Note: occasionally there is a wild card, which means the participant draw the card by themselves. It is indicated as 'wc'.

    For visual examples of each explanatory form card, please refer to the Explanatory_form_labels.pdf document.

    Link to the details on users' requirements on different explanatory forms

    Code and report for EUCA data quantatitve analysis

    EUCA data analysis code EUCA quantatative data analysis report

    EUCA data citation @article{jin2021euca, title={EUCA: the End-User-Centered Explainable AI Framework}, author={Weina Jin and Jianyu Fan and Diane Gromala and Philippe Pasquier and Ghassan Hamarneh}, year={2021}, eprint={2102.02437}, archivePrefix={arXiv}, primaryClass={cs.HC} }

  5. i

    Data from: Customer Churn Dataset

    • ieee-dataport.org
    • kaggle.com
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usman JOY (2024). Customer Churn Dataset [Dataset]. http://doi.org/10.21227/wc9d-b672
    Explore at:
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    IEEE Dataport
    Authors
    Usman JOY
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Customer log dataset is a 12.5 GB JSON file and it contains 18 columns and 26,259,199 records. There are 12 string columns and 6 numeric columns, which may also contain null or NaN values. The columns include userId, artist, auth, firstName, gender, itemInSession, lastName, length, level, location, method, page, registration, sessionId, song,status, ts and userAgent. As evident from the column names, the dataset contains various user-related information, such as user identifiers, demographic details (firstName, lastName, gender), interaction details (artist, song, length, itemInSession, sessionId, registration, lastinteraction) and technical details (userAgent, method, page, location, status, level, auth).

  6. f

    Data_Sheet_2_Construction and Verification of a Predictive Model for Risk...

    • figshare.com
    txt
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaqiong He; Peng Liu; Leyun Xie; Saizhen Zeng; Huashan Lin; Bing Zhang; Jianbin Liu (2023). Data_Sheet_2_Construction and Verification of a Predictive Model for Risk Factors in Children With Severe Adenoviral Pneumonia.CSV [Dataset]. http://doi.org/10.3389/fped.2022.874822.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Frontiers
    Authors
    Yaqiong He; Peng Liu; Leyun Xie; Saizhen Zeng; Huashan Lin; Bing Zhang; Jianbin Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveTo construct and validate a predictive model for risk factors in children with severe adenoviral pneumonia based on chest low-dose CT imaging and clinical features.MethodsA total of 177 patients with adenoviral pneumonia who underwent low-dose CT examination were collected between January 2019 and August 2019. The assessment criteria for severe pneumonia were divided into mild group (N = 125) and severe group (N = 52). All cases divided into training cohort (N = 125) and validation cohort (N = 52). We constructed a prediction model by drawing a nomogram and verified the predictive efficacy of the model through the ROC curve, calibration curve and decision curve analysis.ResultsThe difference was statistically significant (P < 0.05) between the mild adenovirus pneumonia group and the severe adenovirus pneumonia group in gender, age, weight, body temperature, L/N ratio, LDH, ALT, AST, CK-MB, ADV DNA, bronchial inflation sign, emphysema, ground glass sign, bronchial wall thickening, bronchiectasis, pleural effusion, consolidation score, and lobular inflammation score. Multivariate logistic regression analysis showed that gender, LDH value, emphysema, consolidation score, and lobular inflammation score were severe independent risk factors for adenovirus pneumonia in children. Logistic regression was employed to construct clinical model, imaging semantic feature model, and combined model. The AUC values of the training sets of the three models were 0.85 (0.77–0.94), 0.83 (0.75–0.91), and 0.91 (0.85–0.97). The AUC of the validation set was 0.77 (0.64–0.91), 0.83 (0.71–0.94), and 0.85 (0.73–0.96), respectively. The calibration curve fit good of the three models. The clinical decision curve analysis demonstrates the clinical application value of the nomogram prediction model.ConclusionThe prediction model based on chest low-dose CT image characteristics and clinical characteristics has relatively clear predictive value in distinguishing mild adenovirus pneumonia from severe adenovirus pneumonia in children and might provide a new method for early clinical prediction of the outcome of adenovirus pneumonia in children.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kaivan Kamali (2022). Regression analysis in Galaxy with car purchase price prediction dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4660496

Regression analysis in Galaxy with car purchase price prediction dataset

Explore at:
Dataset updated
Aug 4, 2022
Dataset authored and provided by
Kaivan Kamali
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Source/Credit: Michael Grogan https://github.com/MGCodesandStats https://github.com/MGCodesandStats/datasets/blob/master/cars.csv

Sample dataset for regression analysis. Given 5 attributes (age, gender, miles driven per day, debt, and income) predict how much someone will spend on purchasing a car. All 5 of the input attributes have been scaled to be in 0 to 1 range. Training set has 723 training examples. Test set has 242 test examples.

This dataset will be used in an upcoming Galaxy Training Network tutorial (https://training.galaxyproject.org/training-material/topics/statistics/) on use of feedforward neural networks for regression analysis.

Search
Clear search
Close search
Google apps
Main menu