Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:
· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.
All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.
Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.
For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.
· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.
The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.
The project was undertaken by two staff:
Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset
Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Document in .xlsx format. Contains 2 sheets. The first, entitled "profiles", has 10 columns (Orcid no.; id; Name; Domain; Citations; Description; Keywords; Field; Gender and Type) and 1032 rows (number of authors' profiles). The second sheet contains the discarded profiles. To protect personal data, Name column data values (Profiles Sheet: C Column; Discarded Sheet: B Column) have been replaced by a correlation number and Citations column data values (Profiles Sheet: E Column; Discarded Sheet: D Column) have been replaced by X value. The purpose of this work is to determine the use of Research Organizations Registry (ROR) IDs in author academic profiles, specifically in Google Scholar Profiles (GSP). To do this, all the Google Scholar profiles including the term ROR in any of the public descriptive fields were collected and analyzed. The results evidence a low use of ROR IDs (1,033 profiles), mainly from a few institutions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
UPDATE 2: Since I was having encoding errors when doing EDA, I theorized my problem was at the source of my dataset: Google Sheets. I had originally created a Google Sheets doc and exported it into .csv and .txt files. Maybe that was giving me my encoding errors. I created a new .txt doc ("harrypotter_dataset.txt") -manually, not exporting it this time- to test out this theory. Thank you for your patience with me.
UPDATE: I think it'd be easier if the index, budget, author, producer, etc. are the columns instead of rows. I updated this dataset to include a transposed version of the original documents. Thanks for your patience, as I am a beginner data analyst and still learning.
For this dataset, I switched the X and Y axis (Columns and Rows), since there are only 8 installments. I figured scrolling through vertically would be more natural and instinctive than horizontally.
For the books, I included statistics for the number of chapters and data for illustrators. For the movies; there are statistics for runtimes and budgets and data for producers, directors, etc. I did not include the number of pages because that would depend on which version or edition you are reading. There are VARIOUS among VARIOUS versions and editions of Harry Potter.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Project is still being worked on.
Initially, this dataset was just for a Google Data Analytics project, where I was given a task to accomplish with the data in a spreadsheet: look at the table given in the spreadsheet, and see if there's a correlation between temperature and revenue in ice cream sales. Eventually, I did see the pattern: higher temperatures usually meant more revenue, which seems realistic. However, I wanted to dig further into the data and perform a deeper analysis using a visualization, and maybe even a regression. My new questions were, "How strong is this correlation?" and "Can we represent the data using a linear regression?"
Facebook
TwitterThinking Machines Data Science is releasing TM Open Buildings, a dataset of manually-drawn building outlines covering 12 Philippine cities with detailed annotations on building and roof attributes as seen over satellite imagery. We contribute the buildings in OpenStreetMap and also made available for download in Kaggle. This is made possible with the support from the Lacuna Fund.
The team has consulted HOTOSM Asia Pacific and community architects from the Philippine Action for Community-led Shelter Initiatives (PACSII) to validate our attributes and to ensure that our contributions are documented properly. We also looked at street-level views to check tags whenever available. We will take into consideration the feedback from local mappers as local knowledge always precedes, and will always provide changeset comments that are in compliance with OSM guidelines.
You may view more details of our process in our wiki page. Kindly use our Github Issues tab to file any specific concerns about the dataset.
This TM Open Buildings dataset is made available by Thinking Machines under the Open Database License (ODbL). Any rights in individual contents of the database are licensed under the Database Contents License.
We define the buildings we mapped, as well as the attributes included, in the table below. Please refer to our wiki page for more details.
| Building Type |Subtype | Definition | Mapped Attributes |
|----------------|--------|----------------------------------------------------------------------------------------|---------------------------------------------------------|
| Settlement | Single | Residential houses that are individually distinct from surrounding structures | Roof material, Roof layout, Is within gated community? |
| | Dense | Tight clusters of small residential houses that do not have distinguishable boundaries | - |
| Non-settlement | | Commercial, industrial, or institutional buildings | Building height |
The dataset covers selected 250m x 250m tiles in 12 Philippine cities, namely Dagupan City, Palayan City, City of Navotas, City of Mandaluyong, City of Muntinlupa, Legazpi City, Tacloban City, Iloilo City, Mandaue City, Cagayan de Oro City, Davao City, and Zamboanga City. The tiles are chosen to focus on residential areas that lie on a wide variety of terrains (urban, coastal, riparian, agricultural, etc.). All settlements and non-settlements within each tile are drawn manually. Data on the locations of the tiles are given in the following table.
The following table contains the definitions of the attributes and how it is tagged in OSM.
| Attribute | Type | Characteristics | OSM Key and Tag |
|------------------------|------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| Roof Material | Natural/Galvanized Iron (GI)/Mixed | Looks rusty when old, silver/gray when new, lines and patches are usually evident. | roof:material = metal_sheet |
| | Metal/Tiled | Whole roof is usually one solid color, tiled roofs have texture. | roof:material = roof_tiles |
| | Concrete | Flat, usually has raised white edges, no visible roof “folds”, may be smooth or have objects on top. ...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset updates daily for the Numerai Crypto data (daily competition), and weekly on Mondays for the Yiedl.ai weekly competition. The Yiedl data contains the most recent dataset from yiedl.ai, as well as a quickstarter notebook. It now also includes the Numerai Crypto daily data (including historical), which may be useful in both competitions. It should be everything you need in order to get started in these Crypto currency prediction competitions.
You can apply for an airdrop of 100 $YIEDL tokens here, which you can use to stake on your predictions to earn more tokens if your predictions are correct (or burn tokens if they are not).
Experienced data scientists can apply for a grant of an additional 5000 $YIEDL tokens, if approved.
The $YIEDL token is a recently launched token on the Polygon blockchain. More information can be found at the below links.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:
· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.
All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.
Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.
For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.
· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.
The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.
The project was undertaken by two staff:
Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset
Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset