7 datasets found
  1. s

    Data from: Fostering cultures of open qualitative research: Dataset 2 –...

    • orda.shef.ac.uk
    xlsx
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts [Dataset]. http://doi.org/10.15131/shef.data.23567223.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    The University of Sheffield
    Authors
    Matthew Hanchard; Itzel San Roman Pineda
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

    · Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

    The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

    The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.

    ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:

    · 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.

    All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.

    Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.

    For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.

    · 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.

    The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.

    The project was undertaken by two staff:

    Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset

    Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset

  2. C

    Use of Research Organizations Registry (ROR) identifiers in author academic...

    • dataverse.csuc.cat
    tsv, txt
    Updated Dec 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enrique Orduña-Malena; Enrique Orduña-Malena; Nuria Bautista-Puig; Nuria Bautista-Puig (2022). Use of Research Organizations Registry (ROR) identifiers in author academic profiles: the case of Google Scholar Profiles [dataset] [Dataset]. http://doi.org/10.34810/data579
    Explore at:
    tsv(230204), txt(2661)Available download formats
    Dataset updated
    Dec 13, 2022
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Enrique Orduña-Malena; Enrique Orduña-Malena; Nuria Bautista-Puig; Nuria Bautista-Puig
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Document in .xlsx format. Contains 2 sheets. The first, entitled "profiles", has 10 columns (Orcid no.; id; Name; Domain; Citations; Description; Keywords; Field; Gender and Type) and 1032 rows (number of authors' profiles). The second sheet contains the discarded profiles. To protect personal data, Name column data values (Profiles Sheet: C Column; Discarded Sheet: B Column) have been replaced by a correlation number and Citations column data values (Profiles Sheet: E Column; Discarded Sheet: D Column) have been replaced by X value. The purpose of this work is to determine the use of Research Organizations Registry (ROR) IDs in author academic profiles, specifically in Google Scholar Profiles (GSP). To do this, all the Google Scholar profiles including the term ROR in any of the public descriptive fields were collected and analyzed. The results evidence a low use of ROR IDs (1,033 profiles), mainly from a few institutions.

  3. Harry Potter Books and Movies Dataset

    • kaggle.com
    zip
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MK Rehage (2022). Harry Potter Books and Movies Dataset [Dataset]. https://www.kaggle.com/datasets/mkrehage/harry-potter-books-and-movies-dataset
    Explore at:
    zip(11487 bytes)Available download formats
    Dataset updated
    Jun 7, 2022
    Authors
    MK Rehage
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    UPDATE 2: Since I was having encoding errors when doing EDA, I theorized my problem was at the source of my dataset: Google Sheets. I had originally created a Google Sheets doc and exported it into .csv and .txt files. Maybe that was giving me my encoding errors. I created a new .txt doc ("harrypotter_dataset.txt") -manually, not exporting it this time- to test out this theory. Thank you for your patience with me.

    UPDATE: I think it'd be easier if the index, budget, author, producer, etc. are the columns instead of rows. I updated this dataset to include a transposed version of the original documents. Thanks for your patience, as I am a beginner data analyst and still learning.

    For this dataset, I switched the X and Y axis (Columns and Rows), since there are only 8 installments. I figured scrolling through vertically would be more natural and instinctive than horizontally.

    For the books, I included statistics for the number of chapters and data for illustrators. For the movies; there are statistics for runtimes and budgets and data for producers, directors, etc. I did not include the number of pages because that would depend on which version or edition you are reading. There are VARIOUS among VARIOUS versions and editions of Harry Potter.

  4. ICSE 2025 - Artifact

    • figshare.com
    pdf
    Updated Jan 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FARIDAH AKINOTCHO (2025). ICSE 2025 - Artifact [Dataset]. http://doi.org/10.6084/m9.figshare.28194605.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    FARIDAH AKINOTCHO
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Mobile Application Coverage: The 30% Curse and Ways Forward## Purpose In this artifact, we provide the information about our benchmarks used for manual and tool exploration. We include coverage results achieved by tools and human analysts as well as plots of the coverage progression over time for analysts. We further provide manual analysis results for our case study, more specifically extracted reasons for unreachability for the case study apps and extracted code-level properties, which constitute a ground truth for future work in coverage explainability. Finally, we identify a list of beyond-GUI exploration tools and categorize them for future work to take inspiration from. We are claiming available and reusable badges; the artifact is fully aligned with the results described in our paper and comprehensively documented.## ProvenanceThe paper preprint is available here: https://people.ece.ubc.ca/mjulia/publications/Mobile_Application_Coverage_ICSE2025.pdf## Data The artifact submission is organized into five parts:- 'BenchInfo' excel sheet describing our experiment dataset- 'Coverage' folder containing coverage results for tools and analysts (RQ1) - 'Reasons' excel sheet describing our manually extracted reasons for unreachability (RQ2)- 'ActivationProperties' excel sheet describing our manually extracted code properties of unreached activities (RQ3)- 'ActivationProperties-Graph' pdf which presents combinations of the extracted code properties in a graph format.- 'BeyondGUI' folder containing information about identified techniques which go beyond GUI exploration.The artifact requires about 15MB of storage.### Dataset: 'BenchInfo.xlsx'This file list the full application dataset used for experiments into three tabs: 'BenchNotGP' (apps from AndroTest dataset which are not on Google Play), 'BenchGP' (apps from AndroTest which are also on Google Play) and 'TopGP' (top ranked free apps from Google Play). Each tab contains the following information:- Application Name- Package Name- Version Used (Latest)- Original Version- # Activities- Minimum SDK- Target SDK- # Permissions (in Manifest)- List of Permissions (in Manifest)- # Features (in Manifest)- List of Features (in Manifest)The 'TopGP' sheet also includes Google-Play-specific information, namely:- Category (one of 32 app categories)- Downloads- Popularity RankThe 'BenchGP' and 'BenchNotGP' sheets also include the original version (included in the AndroTest benchmark) and the source (one of F-Droid, Github or Google Code Archives).### RQ1: 'Coverage'The 'Coverage' folder includes coverage results for tools and analysts, and is structured as follows:- 'CoverageResults.xlsx": An excel sheet containing the coverage results achieved by each human analysts and tool. - The first tab described the results over all apps for analysts combined, tools combined, and analysts + tools, which map to Table II in the paper. - Each of the following 42 tab, one per app in TopGP, marks the activities reached by Analyst 1, Analyst 2, Tool 1 (ape) and Tool 2 (fastbot), with an 'x' in the corresponding column to indicate that the activity was reached by the given agent.- 'Plots': A folder containing plots of the progressive coverage over time of analysts, split into one folder for 'Analyst1' and one for 'Analyst2'. - Each of the analysts' folder includes a subfolder per benchmark ('BenchNotGP', 'BenchGP' and 'TopGP'), containing as many png files as applications in the benchmark (respectively 47, 14 and 42 image files) named 'ANALYST_[X]_[APP_PACKAGE_NAME]'.png.### RQ2: 'Reasons.xslx'This file contains the extracted reasons for unreachability for the 11 apps manually analyzed. - The 'Summary' tab provides an overview of unreached activities per reasons over all apps and per app, which corresponds to Table III in the paper. - The following 11 tabs, each corresponding to and named after a single application, describe the reasons associated with each activity of that application. Each column corresponds to a single reason and 'x' indicates that the activity is unreached due to the reason in that column. The top row sums up the total number of activities unreached due to a given reason in each column.- The activities at the bottom which are greyed out correspond to activities that were reached during exploration, and are thus excluded from the reason extraction.### RQ3: 'ActivationProperties.xslx'This file contains the full list of activation properties extracted for each of the 185 activities analyzed for RQ2.The first half of the columns (columns C-M) correspond to the reasons (excluding Transitive, Inconclusive and No Caller) and the second half (columns N-AD) correspond to properties described in Figure 5 in the paper, namely:- Exported- Activation Location: - Code: GUI/lifecycle, Other Android or App-specific - Manifest- Activation Guards: - Enforcement: In Code or In Resources - Restriction: Mandatory or Discretionary- Data: - Type: Parameters, Execution Dependencies - Format: Primitive, Strings, ObjectsThe rows are grouped by applications, and each row correspond to an activity of that application. 'x' in a given column indicates the presence of the property in that column within the analyzed path to the activity. The third and fourth rows sums up the numbers and percentages for each property, as reported in Figure 5.### RQ3: 'ActivationProperties-Graph.pdf'This file shows combinations of the individual properties listed in 'ActivationProperties.xlsx' in a graph format, extending the combinations described in Table IV with data (types and format) and reasons for unreachability.### BeyondGUIThis folder includes:- 'ToolInfo.xlsx': an excel sheet listing the identified 22 beyond-GUI papers, the date of publication, availability, invasiveness (Source code, Bytecode, framework, OS) and their targeting strategy (None, Manual or Automated).- ToolClassification.pdf': a pdf file describing our paper selection methodology as well as a classication of the techniques in terms of Invocation Strategy, Navigation Strategy, Value Generation Strategy, and Value Generation Types. We fully introduced these categories in the pdf file.## Requirements & technology skills assumed by the reviewer evaluating the artifactThe artifact entirely consists of Excel sheets which can be opened with common Excel visualization software, i.e., Microsoft Excel, coverage plots as PNG files and PDF files. It requires about 15MB of storage in total.No other specific technology skills are required of the reviewer evaluating the artifact.

  5. Temperature and Ice Cream Sales

    • kaggle.com
    zip
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    rephy (2024). Temperature and Ice Cream Sales [Dataset]. https://www.kaggle.com/datasets/raphaelmanayon/temperature-and-ice-cream-sales
    Explore at:
    zip(1502 bytes)Available download formats
    Dataset updated
    Feb 19, 2024
    Authors
    rephy
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Project is still being worked on.

    Initially, this dataset was just for a Google Data Analytics project, where I was given a task to accomplish with the data in a spreadsheet: look at the table given in the spreadsheet, and see if there's a correlation between temperature and revenue in ice cream sales. Eventually, I did see the pattern: higher temperatures usually meant more revenue, which seems realistic. However, I wanted to dig further into the data and perform a deeper analysis using a visualization, and maybe even a regression. My new questions were, "How strong is this correlation?" and "Can we represent the data using a linear regression?"

  6. TM Open Buildings - Philippines

    • kaggle.com
    zip
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thinking Machines Data Science (2023). TM Open Buildings - Philippines [Dataset]. https://www.kaggle.com/datasets/thinkdatasci/tm-open-buildings-philippines
    Explore at:
    zip(1758170 bytes)Available download formats
    Dataset updated
    Dec 18, 2023
    Authors
    Thinking Machines Data Science
    Area covered
    Philippines
    Description

    Thinking Machines Data Science is releasing TM Open Buildings, a dataset of manually-drawn building outlines covering 12 Philippine cities with detailed annotations on building and roof attributes as seen over satellite imagery. We contribute the buildings in OpenStreetMap and also made available for download in Kaggle. This is made possible with the support from the Lacuna Fund.

    Producing the dataset

    The team has consulted HOTOSM Asia Pacific and community architects from the Philippine Action for Community-led Shelter Initiatives (PACSII) to validate our attributes and to ensure that our contributions are documented properly. We also looked at street-level views to check tags whenever available. We will take into consideration the feedback from local mappers as local knowledge always precedes, and will always provide changeset comments that are in compliance with OSM guidelines.

    You may view more details of our process in our wiki page. Kindly use our Github Issues tab to file any specific concerns about the dataset.

    License

    This TM Open Buildings dataset is made available by Thinking Machines under the Open Database License (ODbL). Any rights in individual contents of the database are licensed under the Database Contents License.

    Building Definitions

    We define the buildings we mapped, as well as the attributes included, in the table below. Please refer to our wiki page for more details. | Building Type |Subtype | Definition | Mapped Attributes |
    |----------------|--------|----------------------------------------------------------------------------------------|---------------------------------------------------------| | Settlement | Single | Residential houses that are individually distinct from surrounding structures | Roof material, Roof layout, Is within gated community? |
    | | Dense | Tight clusters of small residential houses that do not have distinguishable boundaries | - |
    | Non-settlement | | Commercial, industrial, or institutional buildings | Building height |

    Coverage

    The dataset covers selected 250m x 250m tiles in 12 Philippine cities, namely Dagupan City, Palayan City, City of Navotas, City of Mandaluyong, City of Muntinlupa, Legazpi City, Tacloban City, Iloilo City, Mandaue City, Cagayan de Oro City, Davao City, and Zamboanga City. The tiles are chosen to focus on residential areas that lie on a wide variety of terrains (urban, coastal, riparian, agricultural, etc.). All settlements and non-settlements within each tile are drawn manually. Data on the locations of the tiles are given in the following table.

    Attributes

    The following table contains the definitions of the attributes and how it is tagged in OSM. | Attribute | Type | Characteristics | OSM Key and Tag | |------------------------|------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------| | Roof Material | Natural/Galvanized Iron (GI)/Mixed | Looks rusty when old, silver/gray when new, lines and patches are usually evident. | roof:material = metal_sheet | | | Metal/Tiled | Whole roof is usually one solid color, tiled roofs have texture. | roof:material = roof_tiles | | | Concrete | Flat, usually has raised white edges, no visible roof “folds”, may be smooth or have objects on top. ...

  7. YIEDL Competition Data (updated daily)

    • kaggle.com
    zip
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2025). YIEDL Competition Data (updated daily) [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/yiedl-competition/versions/80
    Explore at:
    zip(9274033415 bytes)Available download formats
    Dataset updated
    Jan 10, 2025
    Authors
    Joakim Arvidsson
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset updates daily for the Numerai Crypto data (daily competition), and weekly on Mondays for the Yiedl.ai weekly competition. The Yiedl data contains the most recent dataset from yiedl.ai, as well as a quickstarter notebook. It now also includes the Numerai Crypto daily data (including historical), which may be useful in both competitions. It should be everything you need in order to get started in these Crypto currency prediction competitions.

    You can apply for an airdrop of 100 $YIEDL tokens here, which you can use to stake on your predictions to earn more tokens if your predictions are correct (or burn tokens if they are not).

    Experienced data scientists can apply for a grant of an additional 5000 $YIEDL tokens, if approved.

    The $YIEDL token is a recently launched token on the Polygon blockchain. More information can be found at the below links.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts [Dataset]. http://doi.org/10.15131/shef.data.23567223.v2

Data from: Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Oct 8, 2025
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard; Itzel San Roman Pineda
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.

ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:

· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.

All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.

Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.

For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.

· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.

The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.

The project was undertaken by two staff:

Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset

Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset

Search
Clear search
Close search
Google apps
Main menu