100+ datasets found
  1. d

    Data from: Data to create and evaluate distribution models for invasive...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data to create and evaluate distribution models for invasive species for different geographic extents [Dataset]. https://catalog.data.gov/dataset/data-to-create-and-evaluate-distribution-models-for-invasive-species-for-different-geograp
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    We developed habitat suitability models for invasive plant species selected by Department of Interior land management agencies. We applied the modeling workflow developed in Young et al. 2020 to species not included in the original case studies. Our methodology balanced trade-offs between developing highly customized models for a few species versus fitting non-specific and generic models for numerous species. We developed a national library of environmental variables known to physiologically limit plant distributions (Engelstad et al. 2022 Table S1: https://doi.org/10.1371/journal.pone.0263056) and relied on human input based on natural history knowledge to further narrow the variable set for each species before developing habitat suitability models. We developed models using five algorithms with VisTrails: Software for Assisted Habitat Modeling [SAHM 2.1.2]. We accounted for uncertainty related to sampling bias by using two alternative sources of background samples, and constructed model ensembles using the 10 models for each species (five algorithms by two background methods) for three different thresholds (conservative to targeted). The mergedDataset_regionalization.csv file contains predictor values associated with pixels underlying each presence and background point. The testStripPoints_regionalization.csv file contains the locations of the modeled species occurring in the different geographic test strips.

  2. H

    Tutorial: How to use Google Data Studio and ArcGIS Online to create an...

    • hydroshare.org
    • dataone.org
    • +1more
    zip
    Updated Jul 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Beganskas (2020). Tutorial: How to use Google Data Studio and ArcGIS Online to create an interactive data portal [Dataset]. http://doi.org/10.4211/hs.9edae0ef99224e0b85303c6d45797d56
    Explore at:
    zip(2.9 MB)Available download formats
    Dataset updated
    Jul 31, 2020
    Dataset provided by
    HydroShare
    Authors
    Sarah Beganskas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This tutorial will teach you how to take time-series data from many field sites and create a shareable online map, where clicking on a field location brings you to a page with interactive graph(s).

    The tutorial can be completed with a sample dataset (provided via a Google Drive link within the document) or with your own time-series data from multiple field sites.

    Part 1 covers how to make interactive graphs in Google Data Studio and Part 2 covers how to link data pages to an interactive map with ArcGIS Online. The tutorial will take 1-2 hours to complete.

    An example interactive map and data portal can be found at: https://temple.maps.arcgis.com/apps/View/index.html?appid=a259e4ec88c94ddfbf3528dc8a5d77e8

  3. h

    create-health-data

    • huggingface.co
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HealthChat (2024). create-health-data [Dataset]. https://huggingface.co/datasets/huhucheck/create-health-data
    Explore at:
    Dataset updated
    May 3, 2024
    Dataset authored and provided by
    HealthChat
    Description

    huhucheck/create-health-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. SQL Practice File 1

    • kaggle.com
    zip
    Updated May 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjana Murthy (2024). SQL Practice File 1 [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/sql-file-1-create-database-use-create-etc
    Explore at:
    zip(431 bytes)Available download formats
    Dataset updated
    May 10, 2024
    Authors
    Sanjana Murthy
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This data contains Create database, Use, create table (int, varchar, date), describe, alter table (add, modify, char, varchar, after, rename column, to, drop column, drop), show tables, Rename table (to), Drop table.

  5. 02.1 Integrating Data in ArcGIS Pro

    • hub.arcgis.com
    Updated Feb 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iowa Department of Transportation (2017). 02.1 Integrating Data in ArcGIS Pro [Dataset]. https://hub.arcgis.com/documents/cd5acdcc91324ea383262de3ecec17d0
    Explore at:
    Dataset updated
    Feb 16, 2017
    Dataset authored and provided by
    Iowa Department of Transportationhttps://iowadot.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    You have been assigned a new project, which you have researched, and you have identified the data that you need.The next step is to gather, organize, and potentially create the data that you need for your project analysis.In this course, you will learn how to gather and organize data using ArcGIS Pro. You will also create a file geodatabase where you will store the data that you import and create.After completing this course, you will be able to perform the following tasks:Create a geodatabase in ArcGIS Pro.Create feature classes in ArcGIS Pro by exporting and importing data.Create a new, empty feature class in ArcGIS Pro.

  6. H

    Data from: How to create dataset within AfricaRice Dataverse

    • dataverse.harvard.edu
    Updated Feb 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibnou Dieng (2019). How to create dataset within AfricaRice Dataverse [Dataset]. http://doi.org/10.7910/DVN/H3O8OT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Ibnou Dieng
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This document provides information to the user when creating dataset within AfricaRice Dataverse.

  7. d

    Create external links on data.wa.gov

    • catalog.data.gov
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.wa.gov (2025). Create external links on data.wa.gov [Dataset]. https://catalog.data.gov/dataset/create-external-links-on-data-wa-gov
    Explore at:
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    data.wa.gov
    Description

    This page provides instructions to data.wa.gov publishers, to create an external link page to a data source.

  8. Data Engg data

    • kaggle.com
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apurba Sarkar (2021). Data Engg data [Dataset]. https://www.kaggle.com/apurbasarkar/data-engg-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Apurba Sarkar
    Description

    Now based on the above two tables (UserTable and VisitorLogData), you need to create an input feature set for the Marketing Model.

    Input Feature table:

    UserID

    Unique ID of the registered user

    No_of_days_Visited_7_Days

    How many days a user was active on platform in the last 7 days.

    No_Of_Products_Viewed_15_Days

    Number of Products viewed by the user in the last 15 days

    User_Vintage

    Vintage (In Days) of the user as of today

    Most_Viewed_product_15_Days

    Most frequently viewed (page loads) product by the user in the last 15 days. If there are multiple products that have a similar number of page loads then , consider the recent one. If a user has not viewed any product in the last 15 days then put it as Product101.

    Most_Active_OS

    Most Frequently used OS by user.

    Recently_Viewed_Product

    Most recently viewed (page loads) product by the user. If a user has not viewed any product then put it as Product101.

    Pageloads_last_7_days

    Count of Page loads in the last 7 days by the user

    Clicks_last_7_days

    Count of Clicks in the last 7 days by the user

    Process to create Input Feature:

    In the current case, you are supposed to generate an input feature set as on 28-May-2018. So, the visitor table is from 07-May-2018 to 27-May-2018.

    As a Data Engineer Creating ETL Pipeline would definitely be appreciated and provide you the added advantage in interviews, Your effort should be to build ETL Pipeline such that passing the information of user data and log data, It can generate the input feature table automatically

  9. r

    How to create an Okta Account

    • researchdata.edu.au
    • data.nsw.gov.au
    Updated Oct 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spatial Services (DCS) (2025). How to create an Okta Account [Dataset]. https://researchdata.edu.au/how-create-an-okta-account/3403035
    Explore at:
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    data.nsw.gov.au
    Authors
    Spatial Services (DCS)
    Description

    Access API

    Metadata Portal Metadata Information

    Content TitleHow to create an Okta Account
    Content TypeDocument
    DescriptionDocumentation on how to create an Okta Account
    Initial Publication Date09/07/2024
    Data Currency09/07/2024
    Data Update FrequencyOther
    Content SourceData provider files
    File TypeDocument
    Attribution
    Data Theme, Classification or Relationship to other Datasets
    Accuracy
    Spatial Reference System (dataset)Other
    Spatial Reference System (web service)Other
    WGS84 Equivalent ToOther
    Spatial Extent
    Content Lineage
    Data ClassificationUnclassified
    Data Access PolicyOpen
    Data Quality
    Terms and ConditionsCreative Commons
    Standard and Specification
    Data CustodianCustomer Hub
    Point of ContactCustomer Hub
    Data Aggregator
    Data Distributor
    Additional Supporting Information
    TRIM Number

  10. r

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated Apr 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg (2019). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    Dataset updated
    Apr 1, 2019
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Publication


    Primahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387

    Description of R codes and data files in the repository

    This repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).

    The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).

    These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt.

    Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.

    Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).

    The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  11. v

    Coastal Ferry Routes - Create Your First Data Pipeline

    • anrgeodata.vermont.gov
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GP Analysis - Prod Hive 1 (2023). Coastal Ferry Routes - Create Your First Data Pipeline [Dataset]. https://anrgeodata.vermont.gov/datasets/5516ef1c4db846fab0a34a34626c263e
    Explore at:
    Dataset updated
    Mar 21, 2023
    Dataset authored and provided by
    GP Analysis - Prod Hive 1
    License

    https://www2.gov.bc.ca/gov/content?id=A519A56BC2BF44E4A008B33FCF527F61https://www2.gov.bc.ca/gov/content?id=A519A56BC2BF44E4A008B33FCF527F61

    Description

    Use this GeoJSON file as an input dataset in Data Pipelines. To get started, follow the steps in the Create your first data pipeline tutorial.To learn more about Data Pipelines, see Introduction to Data Pipelines.

  12. d

    Footprints and producers of source data used to create southern portion of...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Footprints and producers of source data used to create southern portion of the high-resolution (1 m) San Francisco Bay, California, digital elevation model (DEM) [Dataset]. https://catalog.data.gov/dataset/footprints-and-producers-of-source-data-used-to-create-southern-portion-of-the-high-resolu
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    San Francisco Bay, California
    Description

    Polygon shapefile showing the footprint boundaries, source agency origins, and resolutions of compiled bathymetric digital elevation models (DEMs) used to construct a continuous, high-resolution DEM of the southern portion of San Francisco Bay.

  13. a

    Data from: Create a Project

    • hub.arcgis.com
    Updated Jan 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Delaware (2019). Create a Project [Dataset]. https://hub.arcgis.com/documents/4f4c09e4004446b08826e39bd04eb418
    Explore at:
    Dataset updated
    Jan 17, 2019
    Dataset authored and provided by
    State of Delaware
    Description

    An ArcGIS Pro project may contain maps, scenes, layouts, data, tools, and other items. It may contain connections to folders, databases, and servers. Content can be added from online portals such as your ArcGIS organization or the ArcGIS Living Atlas of the World.In this tutorial, you'll create a new, blank ArcGIS Pro project. You'll add a map to the project and convert the map to a 3D scene.Estimated time: 10 minutesSoftware requirements: ArcGIS Pro

  14. c

    Create Price Prediction Data

    • coinbase.com
    Updated Nov 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Create Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/base-create-1b07
    Explore at:
    Dataset updated
    Nov 13, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset Create over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  15. p

    Kelley Create Locations Data for United States

    • poidata.io
    csv, json
    Updated Oct 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business Data Provider (2025). Kelley Create Locations Data for United States [Dataset]. https://poidata.io/brand-report/kelley-create/united-states
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Oct 20, 2025
    Dataset authored and provided by
    Business Data Provider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    United States
    Variables measured
    Website URL, Phone Number, Review Count, Business Name, Email Address, Business Hours, Customer Rating, Business Address, Brand Affiliation, Geographic Coordinates
    Description

    Comprehensive dataset containing 30 verified Kelley Create locations in United States with complete contact information, ratings, reviews, and location data.

  16. Simulation data and code

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Feb 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte de Vries; E Yagmur Erten (2022). Simulation data and code [Dataset]. http://doi.org/10.6084/m9.figshare.19232535.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Charlotte de Vries; E Yagmur Erten
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
    • PF_simulation_data.zipcontains Simulation data to create figure 2 of de Vries, Erten and Kokko- Code_PF.zip contains C++ code to create the data used to create figure 2 (see PF_simulation_data.zip for the datafiles produced), and it also contains the R script to create figure 2 from the data (Figure2_cloud_25.R). All code files were created by Pen, I., & Flatt, T. (2021). Asymmetry, division of labour and the evolution of ageing in multicellular organisms. Philosophical Transactions of the Royal Society B, 376(1823), 20190729. C++ code is slightly adjusted to change output. Note that the R script takes a long time to run (multiple days on our laptops), and uses a lot of swap memory, we advice running it on a server. Alternatively, you can edit the code to use less than the last 25 days bychanging this line: ddead% filter(t>4975)to for example ddead% filter(t>4998)to use the last 2 time steps only. However, note that therewill be insufficient data at high ages to estimate mortality rates.
  17. d

    Documentation of R scripts to create boxplots of change factors by NOAA...

    • datasets.ai
    • data.usgs.gov
    • +1more
    55
    Updated Jul 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2023). Documentation of R scripts to create boxplots of change factors by NOAA Atlas 14 station, or for all stations in a Florida HUC-8 basin or county (Documentation_R_script_create_boxplot.docx) [Dataset]. https://datasets.ai/datasets/documentation-of-r-scripts-to-create-boxplots-of-change-factors-by-noaa-atlas-14-station-o-fa3c6
    Explore at:
    55Available download formats
    Dataset updated
    Jul 19, 2023
    Dataset authored and provided by
    Department of the Interior
    Area covered
    Florida
    Description

    The Florida Flood Hub for Applied Research and Innovation and the U.S. Geological Survey have developed projected future change factors for precipitation depth-duration-frequency (DDF) curves at 242 National Oceanic and Atmospheric Administration (NOAA) Atlas 14 stations in Florida. The change factors were computed as the ratio of projected future to historical extreme-precipitation depths fitted to extreme-precipitation data from downscaled climate datasets using a constrained maximum likelihood (CML) approach as described in https://doi.org/10.3133/sir20225093. The change factors correspond to the periods 2020-59 (centered in the year 2040) and 2050-89 (centered in the year 2070) as compared to the 1966-2005 historical period.
    An R script (create_boxplot.R) is provided which generates boxplots of change factors by NOAA Atlas 14 station, or for all NOAA Atlas 14 stations in a Florida HUC-8 basin or county. In addition, the R script basin_boxplot.R is provided as an example on how to create a wrapper function that will automate the generation of boxplots of change factors for all Florida HUC-8 basins. This Microsoft Word file (Documentation_R_script_create_boxplot.docx) serves as documentation on the code usage and available options for running the scripts. As described in the documentation, the R scripts rely on some of the Microsoft Excel spreadsheets published as part of this data release. The script uses basins defined in the "Florida Hydrologic Unit Code (HUC) Basins (areas)" from the Florida Department of Environmental Protection (FDEP; https://geodata.dep.state.fl.us/datasets/FDEP::florida-hydrologic-unit-code-huc-basins-areas/explore) and their names are listed in the file basins_list.txt provided with the script. County names are listed in the file counties_list.txt provided with the script. NOAA Atlas 14 stations located in each Florida HUC-8 basin or county are defined in the Microsoft Excel spreadsheet Datasets_station_information.xlsx which is part of this data release. Instructions are provided in code documentation (see highlighted text on page 7 of Documentation_R_script_create_boxplot.docx) so that users can modify the script to generate boxplots for basins different from the FDEP "Florida Hydrologic Unit Code (HUC) Basins (areas)."

  18. d

    Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
    Description

    This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.

  19. c

    Create On Base Price Prediction Data

    • coinbase.com
    Updated Nov 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Create On Base Price Prediction Data [Dataset]. https://www.coinbase.com/price-prediction/base-create-on-base-caaa
    Explore at:
    Dataset updated
    Nov 15, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset Create On Base over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  20. o

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Law...

    • openicpsr.org
    Updated Mar 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Law Enforcement Officers Killed and Assaulted (LEOKA) 1960-2024 [Dataset]. http://doi.org/10.3886/E102180V15
    Explore at:
    Dataset updated
    Mar 25, 2018
    Dataset provided by
    Princeton University
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1960 - 2024
    Area covered
    United States
    Description

    For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 15 release notes:Adds .parquet file formatVersion 14 release notes:Adds 2023 and 2024 dataVersion 13 release notes:Adds 2022 dataVersion 12 release notes:Adds 2021 data.Version 11 release notes:Adds 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will (probably, I haven't seen confirmation either way) be the last LEOKA data they release. Changes .rda file to .rds.Version 10 release notes:Changes release notes description, does not change data.Version 9 release notes:Adds data for 2019.Version 8 release notes:Fix bug for years 1960-1971 where the number of months reported variable was incorrectly down by 1 month. I recommend caution when using these years as they only report either 0 or 12 months of the year, which differs from every other year in the data. Added the variable officers_killed_total which is the sum of officers_killed_by_felony and officers_killed_by_accident.Version 7 release notes:Adds data from 2018Version 6 release notes:Adds data in the following formats: SPSS and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 5 release notes: Adds data for 1960-1974 and 2017. Note: many columns (including number of female officers) will always have a value of 0 for years prior to 1971. This is because those variables weren't collected prior to 1971. These should be NA, not 0 but I'm keeping it as 0 to be consistent with the raw data. Removes support for .csv and .sav files.Adds a number_of_months_reported variable for each agency-year. A month is considered reported if the month_indicator column for that month has a value of "normal update" or "reported, not data."The formatting of the monthly data has changed from wide to long. This means that each agency-month has a single row. The old data had each agency being a single row with each month-category (e.g. jan_officers_killed_by_felony) being a column. Now there will just be a single column for each category (e.g. officers_killed_by_felony) and the month can be identified in the month column. This also results in most column names changing. As such, be careful when aggregating the monthly data since some variables are the same every month (e.g. number of officers employed is measured annually) so aggregating will be 12 times as high as the real value for those variables. Adds a date column. This date column is always set to the first of the month. It is NOT the date that a crime occurred or was reported. It is only there to make it easier to create time-series graphs that require a date input.All the data in this version was acquired from the FBI as text/DAT files and read into R using the package asciiSetupReader. The FBI also provided a PDF file explaining how to create the setup file to read the data. Both the FBI's PDF and the setup file I made are included in the zip files. Data is the same as from NACJD but using all FBI files makes cleaning easier as all column names are already identical. Version 4 release notes: Add data for 2016.Order rows by year (descending) and ORI.Version 3 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The LEOKA data sets contain highly detailed data about the number of officers/civilians employed by an agency and how many officers were killed or assaulted. All the data was acquired from the FBI as text/DAT files and read into R using the package asciiSetupReader. The FBI also provided a PDF file explaining how to create the setup file to read the data. Both the FBI's PDF and the setup file I made are included in the zip files. About 7% of all agencies in the data report more officers or civilians than population. As such, I removed the officers/civilians per 1,000 population variables. You should exercise caution if deciding to generate and use these variables yourself. Several agency had impossible large (>15) officer deaths in a single month. For those months I changed the value to NA. The UCR Handbook (https://ucr.fbi.gov/additional-ucr-publications/ucr_handbook.pdf/view) describes the LEOKA data as follows:"The UCR Program collects data from all contributing agencies ... on officer line-of-duty deaths and assaults. Reporting agencies must submit data on ... their own duly sworn officers f

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Geological Survey (2025). Data to create and evaluate distribution models for invasive species for different geographic extents [Dataset]. https://catalog.data.gov/dataset/data-to-create-and-evaluate-distribution-models-for-invasive-species-for-different-geograp

Data from: Data to create and evaluate distribution models for invasive species for different geographic extents

Related Article
Explore at:
Dataset updated
Oct 1, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description

We developed habitat suitability models for invasive plant species selected by Department of Interior land management agencies. We applied the modeling workflow developed in Young et al. 2020 to species not included in the original case studies. Our methodology balanced trade-offs between developing highly customized models for a few species versus fitting non-specific and generic models for numerous species. We developed a national library of environmental variables known to physiologically limit plant distributions (Engelstad et al. 2022 Table S1: https://doi.org/10.1371/journal.pone.0263056) and relied on human input based on natural history knowledge to further narrow the variable set for each species before developing habitat suitability models. We developed models using five algorithms with VisTrails: Software for Assisted Habitat Modeling [SAHM 2.1.2]. We accounted for uncertainty related to sampling bias by using two alternative sources of background samples, and constructed model ensembles using the 10 models for each species (five algorithms by two background methods) for three different thresholds (conservative to targeted). The mergedDataset_regionalization.csv file contains predictor values associated with pixels underlying each presence and background point. The testStripPoints_regionalization.csv file contains the locations of the modeled species occurring in the different geographic test strips.

Search
Clear search
Close search
Google apps
Main menu