10 datasets found
  1. f

    Data from: HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE...

    • scielo.figshare.com
    • datasetcatalog.nlm.nih.gov
    tiff
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo (2023). HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE USING R SOFTWARE AND RSTUDIO [Dataset]. http://doi.org/10.6084/m9.figshare.19899537.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELO journals
    Authors
    Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    ABSTRACT Meta-analysis is an adequate statistical technique to combine results from different studies, and its use has been growing in the medical field. Thus, not only knowing how to interpret meta-analysis, but also knowing how to perform one, is fundamental today. Therefore, the objective of this article is to present the basic concepts and serve as a guide for conducting a meta-analysis using R and RStudio software. For this, the reader has access to the basic commands in the R and RStudio software, necessary for conducting a meta-analysis. The advantage of R is that it is a free software. For a better understanding of the commands, two examples were presented in a practical way, in addition to revising some basic concepts of this statistical technique. It is assumed that the data necessary for the meta-analysis has already been collected, that is, the description of methodologies for systematic review is not a discussed subject. Finally, it is worth remembering that there are many other techniques used in meta-analyses that were not addressed in this work. However, with the two examples used, the article already enables the reader to proceed with good and robust meta-analyses. Level of Evidence V, Expert Opinion.

  2. m

    Data from: Visual Continuous Time Preferences

    • data.mendeley.com
    Updated Jun 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Prisse (2023). Visual Continuous Time Preferences [Dataset]. http://doi.org/10.17632/ms63y77fcf.5
    Explore at:
    Dataset updated
    Jun 12, 2023
    Authors
    Benjamin Prisse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file compiles the different datasets used and analysis made in the paper "Visual Continuous Time Preferences". Both RStudio and Stata were used for the analysis. The first was used for descriptive statistics and graphs, the second for regressions. We join the datasets for both analysis.

    "Analysis VCTP - RStudio.R" is the RStudio analysis. "Analysis VCTP - Stata.do" is the Stata analysis.

    The RStudio datasets are: "data_Seville.xlsx" is the dataset of observations. "FormularioEng.xlsx" is the dataset of control variables.

    The Stata datasets are: "data_Seville_Stata.dta" is the dataset of observations. "FormularioEng.dta" is the dataset of control variables

    Additionally, the experimental instructions of the six experimental conditions are also available: "Hypothetical MPL-VCTP.pdf" is the instructions and task for hypothetical payment and MPL answered before VCTP. "Hypothetical VCTP-MPL.pdf" is the instructions and task for hypothetical payment and VCTP answered before MPL. "OneTenth MPL-VCTP.pdf" is the instructions and task for BRIS payment and MPL answered before VCTP. "OneTenth VCTP-MPL.pdf" is the instructions and task for BRIS payment and VCTP answered before MPL. "Real MPL-VCTP.pdf" is the instructions and task for real payment and VCTP answered before MPL. "Real VCTP-MPL.pdf" is the instructions and task for real payment and VCTP answered before MPL.

  3. f

    Data and tools for studying isograms

    • figshare.com
    Updated Jul 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Jul 31, 2017
    Dataset provided by
    figshare
    Authors
    Florian Breit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

    Label Data type Description

    isogramy int The order of isogramy, e.g. "2" is a second order isogram

    length int The length of the word in letters

    word text The actual word/isogram in ASCII

    source_pos text The Part of Speech tag from the original corpus

    count int Token count (total number of occurences)

    vol_count int Volume count (number of different sources which contain the word)

    count_per_million int Token count per million words

    vol_count_as_percent int Volume count as percentage of the total number of volumes

    is_palindrome bool Whether the word is a palindrome (1) or not (0)

    is_tautonym bool Whether the word is a tautonym (1) or not (0)

    The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

    Label

    Data type

    Description

    !total_1grams

    int

    The total number of words in the corpus

    !total_volumes

    int

    The total number of volumes (individual sources) in the corpus

    !total_isograms

    int

    The total number of isograms found in the corpus (before compacting)

    !total_palindromes

    int

    How many of the isograms found are palindromes

    !total_tautonyms

    int

    How many of the isograms found are tautonyms

    The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.

  4. m

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • bridges.monash.edu
    • researchdata.edu.au
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg (2023). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    PublicationPrimahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387Description of R codes and data files in the repositoryThis repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  5. d

    CLM16gwl NSW Office of Water_GW licence extract linked to spatial...

    • data.gov.au
    • researchdata.edu.au
    • +1more
    Updated Nov 19, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2019). CLM16gwl NSW Office of Water_GW licence extract linked to spatial locations_CLM_v3_13032014 [Dataset]. https://data.gov.au/data/dataset/activity/4b0e74ed-2fad-4608-a743-92163e13c30d
    Explore at:
    Dataset updated
    Nov 19, 2019
    Dataset provided by
    Bioregional Assessment Program
    Area covered
    New South Wales
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.

    The difference between NSW Office of Water GW licences - CLM v2 and v3 is that an additional column has been added, 'Asset Class' that aggregates the purpose of the licence into the set classes for the Asset Database. Also the 'Completed_Depth' has been added, which is the total depth of the groundwater bore. These columns were added for the purpose of the Asset Register.

    The aim of this dataset was to be able to map each groundwater works with the volumetric entitlement without double counting the volume and to aggregate/ disaggregate the data depending on the final use.

    This has not been clipped to the CLM PAE, therefore the number of economic assets/ relevant licences will drastically reduce once this occurs.

    The Clarence Moreton groundwater licences includes an extract of all licences that fell within the data management acquisition area as provided by BA to NSW Office of Water.

    Aim: To get a one to one ratio of licences numbers to bore IDs.

    Important notes about data:

    Data has not been clipped to the PAE.

    No decision have been made in regards to what purpose of groundwater should be protected. Therefore the purpose currently includes groundwater bores that have been drilled for non-extractive purposes including experimental research, test, monitoring bore, teaching, mineral explore and groundwater explore

    No volume has been included for domestic & stock as it is a basic right. Therefore an arbitrary volume could be applied to account for D&S use.

    Licence Number - Each sheet in the Original Data has a licence number, this is assumed to be the actual licence number. Some are old because they have not been updated to the new WA. Some are new (From_Spreadsheet_WALs). This is the reason for the different codes.

    WA/CA - This number is the 'works' number. It is assumed that the number indicates the bore permit or works approval. This is why there can be multiple works to licence and licences to works number. (For complete glossary see here http://registers.water.nsw.gov.au/wma/Glossary.jsp). Originally, the aim was to make sure that the when there was more than more than one licence to works number or mulitple works to licenes that the mulitple instances were compelte.

    Clarence Moreton worksheet links the individual licence to a works and a volumetric entitlement. For most sites, this can be linked to a bore which can be found in the NGIS through the HydroID. (\wron\Project\BA\BA_all\Hydrogeology_National_Groundwater_Information_System_v1.1_Sept2013). This will allow analysis of depths, lithology and hydrostratigraphy where the data exists.

    We can aggregate the data based on water source and water management zone as can be seen in the other worksheets.

    Data available:

    Original Data: Any data that was bought in from NSW Offcie of Water, includes

    Spatial locations provided by NoW- This is a exported data from the submitted shape files. Includes the licence (LICENCE) numbers and the bore ID (WORK_NUO). (Refer to lineage NSW Office of Water Groundwater Entitlements Spatial Locations).

    Spreadsheet_WAL - The spread sheet from the submitted data, WLS-EXTRACT_WALs_volume. (Refer to Lineage NSW Office of Water Groundwater Licence Extract CLM- Oct 2013)

    WLS_extracts - The combined spread sheets from the submitted data, WLS-EXTRACT . (Refer to Lineage NSW Office of Water Groundwater Licence Extract CLM- Oct 2013)

    Aggregated share component to water sharing plan, water source and water management zone

    Dataset History

    The difference between NSW Office of Water GW licences - CLM v2 and v3 is that an additional column has been added, 'Asset Class' that aggregates the purpose of the licence into the set classes for the Asset Database.

    Where purpose = domestic; or domestic & stock; or stock then it was classed as 'basic water right'. Where it is listed as both a domestic/stock and a licensed use such as irrigation, it was classed as a 'water access right.' All other take and use were classed as a 'water access right'. Where purpose = drainage, waste disposal, groundwater remediation, experimental research, null, conveyancing, test bore - these were not given an asset class. Monitoring bores were classed as 'Water supply and monitoring infrastructure'

    Depth has also been included which is the completed depth of the bore.

    Instructions

    Procedure: refer to Bioregional assessment data conversion script.docx

    1) Original spread sheets have mulitple licence instances if there are more than one WA/CA number. This means that there are more than one works or permit to the licence. The aim is to only have one licence instance.

    2) The individual licence numbers were combined into one column

    3) Using the new column of licence numbers, several vlookups were created to bring in other data. Where the columns are identical in the original spreadsheets, they are combined. The only ones that don't are the Share/Entitlement/allocation, these mean different things.

    4) A hydro ID column was created, this is a code that links this NSW to the NGIS, which is basically a ".1.1" at the end of the bore code.

    5) All 'cancelled' licences were removed

    6) A count of the number of works per licence and number of bores were included in the spreadsheet.

    7) Where the ShareComponent = NA, the Entitlement = 0, Allocation = 0 and there was more than one instance of the same bore, this means that the original licence assigned to the bore has been replaced by a new licence with a share component. Where these criteria were met, the instances were removed

    8) a volume per works ensures that the volume of the licence is not repeated for each works, but is divided by the number of works

    Bioregional assessment data conversion script

    Aim: The following document is the R-Studio script for the conversion and merging of the bioregional assessment data.

    Requirements: The user will need R-Studio. It would be recommended that there is some basic knowledge of R. If there isn't, the only thing that would really need to be changed is the file location and name. The way that R reads files is different to windows and also the locations that R-Studio read is dependent on where R-Studio is originally installed to point. This would need to be completed properly before the script can be run.

    Procedure: The information below the dashed line is the script. This can be copied and pasted directly into R-Studio. Any text with '#' will not be read as a script, so that can be added in and read as an instruction.

    ###########

    # 18/2/2014

    # Code by Brendan Dimech

    #

    # Script to merge extract files from submitted NSW bioregional

    # assessment and convert data into required format. Also use a 'vlookup'

    # process to get Bore and Location information from NGIS.

    #

    # There are 3 scripts, one for each of the individual regions.

    #

    ############

    # CLARENCE MORTON

    # Opening of files. Location can be changed if needed.

    # arc.file is the exported *.csv from the NGIS data which has bore data and Lat/long information.

    # Lat/long weren't in the file natively so were added to the table using Arc Toolbox tools.

    arc.folder = '/data/cdc_cwd_wra/awra/wra_share_01/GW_licencing_and_use_data/Rstudio/Data/Vlookup/Data'

    arc.file = "Moreton.csv"

    # Files from NSW came through in two types. WALS files, this included 'newer' licences that had a share component.

    # The 'OTH' files were older licences that had just an allocation. Some data was similar and this was combined,

    # and other information that wasn't similar from the datasets was removed.

    # This section is locating and importing the WALS and OTH files.

    WALS.folder = '/data/cdc_cwd_wra/awra/wra_share_01/GW_licencing_and_use_data/Rstudio/Data/Vlookup/Data'

    WALS.file = "GW_Clarence_Moreton_WLS-EXTRACT_4_WALs_volume.xls"

    OTH.file.1 = "GW_Clarence_Moreton_WLS-EXTRACT_1.xls"

    OTH.file.2 = "GW_Clarence_Moreton_WLS-EXTRACT_2.xls"

    OTH.file.3 = "GW_Clarence_Moreton_WLS-EXTRACT_3.xls"

    OTH.file.4 = "GW_Clarence_Moreton_WLS-EXTRACT_4.xls"

    newWALS.folder = '/data/cdc_cwd_wra/awra/wra_share_01/GW_licencing_and_use_data/Rstudio/Data/Vlookup/Products'

    newWALS.file = "Clarence_Moreton.csv"

    arc <- read.csv(paste(arc.folder, arc.file, sep="/" ), header =TRUE, sep = ",")

    WALS <- read.table(paste(WALS.folder, WALS.file, sep="/" ), header =TRUE, sep = "\t")

    # Merge any individual WALS and OTH files into a single WALS or OTH file if there were more than one.

    OTH1 <- read.table(paste(WALS.folder, OTH.file.1, sep="/" ), header =TRUE, sep = "\t")

    OTH2 <- read.table(paste(WALS.folder, OTH.file.2, sep="/" ), header =TRUE, sep = "\t")

    OTH3 <- read.table(paste(WALS.folder, OTH.file.3, sep="/" ), header =TRUE, sep = "\t")

    OTH4 <- read.table(paste(WALS.folder, OTH.file.4, sep="/" ), header =TRUE, sep = "\t")

    OTH <- merge(OTH1,OTH2, all.y = TRUE, all.x = TRUE)

    OTH <- merge(OTH,OTH3, all.y = TRUE, all.x = TRUE)

    OTH <- merge(OTH,OTH4, all.y = TRUE, all.x = TRUE)

    # Add new columns to OTH for the BORE, LAT and LONG. Then use 'merge' as a vlookup to add the corresponding

    # bore and location from the arc file. The WALS and OTH files are slightly different because the arc file has

    # a different licence number added in.

    OTH <- data.frame(OTH, BORE = "", LAT = "", LONG = "")

    OTH$BORE <- arc$WORK_NO[match(OTH$LICENSE.APPROVAL, arc$LICENSE)]

    OTH$LAT <- arc$POINT_X[match(OTH$LICENSE.APPROVAL, arc$LICENSE)]

    OTH$LONG <-

  6. Data from: Citizen science project descriptions as science communication...

    • data.europa.eu
    unknown
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). Citizen science project descriptions as science communication texts - the good, the bad, and the ugly [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7381002?locale=pt
    Explore at:
    unknown(58186)Available download formats
    Dataset updated
    Dec 5, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study aimed to determine to what extent do CS project descriptions actually contain the kinds of information relevant to prospective participants and whether this information is conveyed in a comprehensible and attractive manner. To this end, we conducted a qualitative content analysis of a random sample of 120 English-language project descriptions stored in the CS Track database. The coding rubric used for this study is based on the ten-step template for writing engaging project descriptions we recently designed and published. The sample was produced as follows: After creating a dataset containing only English-language project descriptions, we excluded all descriptions which consist of less than 100 or more than 500 words. Texts of less than 100 words cannot be expected to contain a significant amount of information. Project descriptions of more than 500 words are less likely to be read in their entirety than shorter texts and thus ill-suited to the task of capturing the readers’ interest and prompting them to join the project in question. Finally, we applied the ‘random’ function of RStudio to randomly select 120 texts from the resulting dataset of 1283 descriptions. The qualitative content analysis was performed in two consecutive steps. First, in order to ensure that the coding rubric is fit for purpose and all categories within it well-defined and demarcated, all three members of the research team independently coded 40 project descriptions. After discussing the results and making slight modifications to the coding rubric, each team member coded roughly one third of the remaining 80 descriptions. Preliminary results suggest that the majority of project descriptions in our sample fail to mention how citizen scientists will benefit from participating, what kind of training they will receive, how their contributions will be acknowledged, and whether they will have access to project results. Furthermore, the project’s goals, its target audience, and the tasks volunteers will be expected to complete are very often not described explicitly and clearly enough. For instance, very few project descriptions contain concrete information on required skills and equipment or on the time commitment associated with participation. This dataset contains the coding rubric used to analyse project descriptions and a visualisation of preliminary results.

  7. p53motifDB

    • zenodo.org
    application/gzip, bin +2
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Morgan Sammons; Morgan Sammons (2024). p53motifDB [Dataset]. http://doi.org/10.5281/zenodo.13351805
    Explore at:
    application/gzip, bin, pdf, zipAvailable download formats
    Dataset updated
    Sep 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Morgan Sammons; Morgan Sammons
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Zenodo repository contains raw data tables, a Shiny app (via dockerfile), and a sqlite database that makes up the p53motifDB (p53 motif database).

    The p53motifDB is a compendium of genomic locations in the human hg38 reference genome that contain recognizable DNA sequences that match the binding preferences for the transcription factor p53. Multiple types of genomic, epigenomic, and genome variation data were integrated with these locations in order to let researchers quickly generate hypotheses about novel activities of p53 or validate known behaviors.

    Raw data tables

    The raw data tables (raw_tables.tar.gz) are divided into the "primary" table, containing p53 motif locations and other biographical information relating to those genomic locations. The "accesory" tables contain additional descriptive or quantitative information that can be queried based on the information in the "primary" table. A description of table schema for the primary table and all accessory tables can be found in Schema_p53motifDB.xlsx.

    Table_1_DataSources.xlsx contains information about all raw and processed data sources that were used in the construction of the p53motifDB.

    Shiny App

    The Shiny App is designed to allow rapid filtering, querying, and downloading of the primary and accessory tables. Users can access a web-based version at https://p53motifDB.its.albany.edu. Users can also deploy the Shiny app locally by downloading and extracting p53motifDB_shiny.zip and doing one of of the following:

    Option 1: From the extracted folder, run the included Dockerfile to create a Docker image which will deploy to localhost port 3838.

    Option 2: From the shiny_p53motifDB subfolder, run app.R from R or RStudio. This requires a number of dependencies, which may not be compatible with your current version of R. We highly recommend accessing the Shiny app via the web or through the Dockerfile.

    sqlite Database

    Users can perform more complex database queries (beyond those available in the Shiny app) by first downloading sqlite_db.tar.gz. Unpacking this file will reveal the database file p53motifDB.db. This is a sqlite database file containing the same "primary" and "accessory" data from raw_tables.tar.gz and can be used/queried using standard structured query language. The schema of this database, inlcuding relationships between tables, can be seen in p53motifDB_VISUAL_schema.pdf or additional information about each table and the column contents can be examined in the file Schema_p53motifDB.xlsx.

    The gzipped TAR file sqlite_db.tar.gz also contains all of the files and information neccessary to reconstruct the p53motifDB.db via R. Users can source the included R script (database_sqlite_commit.R) or can open, examine, and run via RStudio. We strongly advise unpacking the TAR file which will produce a folder called sqlite_db and then running the included R script from within that folder using either source or running line-by-line in RStudio. The result of this script will be p53motifDB.db and an RData object (sqlite_construction.RData) written to the sqlite_db folder.

    If opening and running database_sqlite_commit.R via RStudio, please uncomment line 10 and comment out lines 13 and 14.

    Please also be aware of the minimal package dependencies in R. The included version of p53motifDB.db was created using R (v. 3.4.0) and the following packages (and versions) available via CRAN:

    RSQLite (v. 2.3.7), DBI (v. 1.2.3), tidyverse (2.0.0), and utils (v. 4.3.0) packages

    Credit

    The p53motifDB was created by Morgan Sammons, Gaby Baniulyte, and Sawyer Hicks.

    Please let us know if you have any questions, comments, or would like additional datasets included in the next version of the p53motifDB by contacting masammons(at)albany.edu

  8. m

    Short-range Early Phase COVID-19 Forecasting R-Project and Data

    • data.mendeley.com
    Updated Dec 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Lynch (2020). Short-range Early Phase COVID-19 Forecasting R-Project and Data [Dataset]. http://doi.org/10.17632/cytrb8p42g.2
    Explore at:
    Dataset updated
    Dec 15, 2020
    Authors
    Christopher Lynch
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This R-Project and its data files are provided in support of ongoing research efforts for forecasting COVID-19 cumulative case growth at varied geographic levels. All code and data files are provided to facilitate reproducibility of current research findings. Seven forecasting methods are evaluated with respect to their effectiveness at forecasting one-, three-, and seven-day cumulative COVID-19 cases, including: (1) a Naïve approach; (2) Holt-Winters exponential smoothing; (3) growth rate; (4) moving average (MA); (5) autoregressive (AR); (6) autoregressive moving average (ARMA); and (7) autoregressive integrated moving average (ARIMA). This package is developed to be directly opened and run in RStudio through the provided RProject file. Code developed using R version 3.6.3.

    This software generates the findings of the article entitled "Short-range forecasting of coronavirus disease 2019 (COVID-19) during early onset at county, health district, and state geographic levels: Comparative forecasting approach using seven forecasting methods" using cumulative case counts reported by The New York Times up to April 22, 2020. This package provides two avenues for reproducing results: 1) Regenerate the forecasts from scratch using the provided code and data files and then run the analyses; or 2) Load the saved forecast data and run the analyses on the existing data

    License info can be viewed from the "License Info.txt" file.

    The "RProject" folder contains the RProject file which opens the project in RStudio with the desired working directory set.

    README files are contained in each sub-folder which provide additoinal detail on the contents of the folder.

    Copyright (c) 2020 Christopher J. Lynch and Ross Gore

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

    Except as contained in this notice, the name(s) of the above copyright holders shall not be used in advertising or otherwise to promote the sale, use, or other dealings in this Software without prior written authorization.

  9. Ski jumping results database

    • kaggle.com
    zip
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wiktor Florek (2021). Ski jumping results database [Dataset]. https://www.kaggle.com/wrotki8778/ski-jumping-results-database-2009now
    Explore at:
    zip(4133244 bytes)Available download formats
    Dataset updated
    Feb 2, 2021
    Authors
    Wiktor Florek
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    Hello. As a big ski jumping fan, I would like to invite everybody to something like a project called "Ski Jumping Data Center". Primary goal is as below:

    Collect as many data about ski-jumping as possible and create as many useful insights based on them as possible

    In the mid-September last year (12.09.20) I thought "Hmm, I don't know any statistical analyses of ski jumping". In fact, the only easily found public data analysis about SJ I know is https://rstudio-pubs-static.s3.amazonaws.com/153728_02db88490f314b8db409a2ce25551b82.html

    Question is: why? This discipline is in fact overloaded with data, but almost nobody took this topic seriously. Therefore I decided to start collecting data and analyzing them. However, the amount of work needed to capture various data (i.e. jumps and results of competitions) was so big and there is so many ways to use these informations, that make it public was obvious. In fact, I have a plan to expand my database to be as big as possible, but it requires more time and (I wish) more help.

    Content

    Data below is (in a broad sense) created by merging a lot of (>6000) PDFs with the results of almost 4000 ski jumping competitions organized between (roughly) 2009 and 2021. Creation of this dataset costed me about 150 hours of coding and parsing data and over 4 months of hard work. My current algorithm can parse in a quasi-instant way results of the consecutive events, so this dataset can be easily extended. For details see the Github page: https://github.com/wrotki8778/Ski_jumping_data_center The observations contain standard information about every jump - style points, distance, take-off speed, wind etc. Main advantage of this dataset is the number of jumps - it's quite high (by the time of uploading it's almost 250 000 rows), so we can analyze this data in various ways, although the number of columns is not so insane.

    Acknowledgements

    Big "thank you" should go to the creators of tika package, because without theirs contribution I probably wouldn't create this dataset at all.

    Inspiration

    I plan to make at least a few insights from this data: 1) Are the wind/gate factor well adjusted? 2) How strong is the correlation between the distance and the style marks? Is the judgement always fair? 3) (advanced) Can we create a model that predicts the performance/distance of an athlete in a given competition? Maybe some deep learning model? 4) Which characteristics of athletes are important in achieving the best jumps - height/weight etc.?

  10. Data files and code on the comparison of SARS-CoV-2 with non-segmented RNA...

    • springernature.figshare.com
    pptx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaodi Chen (2023). Data files and code on the comparison of SARS-CoV-2 with non-segmented RNA viruses [Dataset]. http://doi.org/10.6084/m9.figshare.12482813.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Xiaodi Chen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This fileset contains 15 data files and 1 ReadMe file.The data files are as follows:Five Results files in .fasta file format. These are: Result_MacroDomain_.fasta, Result_Spike1_.fasta, Result_Spike2_.fasta, Result_Spike2protein_.fasta and Result_Viroporin_.fasta.Two power point presentations (.pptx file format). These are: Analysis by MegaX_.pptx and Open Reading Frames_Conserved Domain Found in ORF and CDD_Children_.pptx.Three data files in .nwk file format. These are: NeurotropicRNA.noSegmentR1_original tree.nwk, MacroDomainGen_BootstrapTree.nwk and ViroporinGen_BootstrapTree.nwk.One code file in .R file format. This is: Protein Alignment RStudio_msa package_.R.One file in .tex file format. This is: Covid19_.tex.Two files in .txt file format. These are: Covid19_.txt and texshade.sty package_.txt.One file in .sty file format. This is: texshade_.sty.The 5 fasta files contain the results of the multiple protein sequence alignment.The power point presentations Open Reading Frames_Conserved Domain Found in ORF and CDD_Children.pptx contains the search results (snapshot figures) obtained by the Open Reading Frame (ORF) finder and Conserved Domains Database (CDD) database (NCBI). This file provides evidence to show how Figure2 and 3 were made.The power point presentation Analysis by MegaX_.pptx contains the evidence (parameters) to show how the sequences were aligned and how the tree files were made in Figure 1 by MegaX software.The three .nwk files (in Newick tree format) were produced using the MEGAX software. These files contain the data used to construct the phylogenetic trees shown in figures 1, 2B and 2C of the article.The R file contains all the codes required to produce figures 2 and 3 in the article.The Covid19.tex file works together with R Studio, containing the msa package (an R package for Multiple Sequence Alignment) to make Figures 2 and 3 in the article. The sty file is a system file for LaTex and contains codes. Study aims and methodology: The primary objective of the current study was to determine the possible evolutionary and molecular relationships between SARS-CoV-2 and non-segmented RNA viruses, especially the viruses that can infect the nervous system in infants and children.The whole-genome sequences of 35 non-segmented RNA viruses including 13 CoVs were retrieved from the National Center for Biotechnology Information (NCBI), for the purpose of phylogenetic analysis, which was conducted with MEGAX (Penn State University, PA, USA). All genomic sequences were aligned with the ClustalW algorithm and phylogenetic prediction inferred by the maximum likelihood method and Tamura-Nei model. RStudio (RStudio, Inc., Boston, MA, USA) with msa package was used for multiple protein sequence alignment. For more details on the methodology, please read the related article.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo (2023). HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE USING R SOFTWARE AND RSTUDIO [Dataset]. http://doi.org/10.6084/m9.figshare.19899537.v1

Data from: HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE USING R SOFTWARE AND RSTUDIO

Related Article
Explore at:
tiffAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

ABSTRACT Meta-analysis is an adequate statistical technique to combine results from different studies, and its use has been growing in the medical field. Thus, not only knowing how to interpret meta-analysis, but also knowing how to perform one, is fundamental today. Therefore, the objective of this article is to present the basic concepts and serve as a guide for conducting a meta-analysis using R and RStudio software. For this, the reader has access to the basic commands in the R and RStudio software, necessary for conducting a meta-analysis. The advantage of R is that it is a free software. For a better understanding of the commands, two examples were presented in a practical way, in addition to revising some basic concepts of this statistical technique. It is assumed that the data necessary for the meta-analysis has already been collected, that is, the description of methodologies for systematic review is not a discussed subject. Finally, it is worth remembering that there are many other techniques used in meta-analyses that were not addressed in this work. However, with the two examples used, the article already enables the reader to proceed with good and robust meta-analyses. Level of Evidence V, Expert Opinion.

Search
Clear search
Close search
Google apps
Main menu