100+ datasets found
  1. VegeNet - Image datasets and Codes

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jo Yen Tan; Jo Yen Tan (2022). VegeNet - Image datasets and Codes [Dataset]. http://doi.org/10.5281/zenodo.7254508
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jo Yen Tan; Jo Yen Tan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Compilation of python codes for data preprocessing and VegeNet building, as well as image datasets (zip files).

    Image datasets:

    1. vege_original : Images of vegetables captured manually in data acquisition stage
    2. vege_cropped_renamed : Images in (1) cropped to remove background areas and image labels renamed
    3. non-vege images : Images of non-vegetable foods for CNN network to recognize other-than-vegetable foods
    4. food_image_dataset : Complete set of vege (2) and non-vege (3) images for architecture building.
    5. food_image_dataset_split : Image dataset (4) split into train and test sets
    6. process : Images created when cropping (pre-processing step) to create dataset (2).
  2. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.

  3. a

    02.1 Integrating Data in ArcGIS Pro

    • hub.arcgis.com
    Updated Feb 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iowa Department of Transportation (2017). 02.1 Integrating Data in ArcGIS Pro [Dataset]. https://hub.arcgis.com/documents/cd5acdcc91324ea383262de3ecec17d0
    Explore at:
    Dataset updated
    Feb 15, 2017
    Dataset authored and provided by
    Iowa Department of Transportation
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    You have been assigned a new project, which you have researched, and you have identified the data that you need.The next step is to gather, organize, and potentially create the data that you need for your project analysis.In this course, you will learn how to gather and organize data using ArcGIS Pro. You will also create a file geodatabase where you will store the data that you import and create.After completing this course, you will be able to perform the following tasks:Create a geodatabase in ArcGIS Pro.Create feature classes in ArcGIS Pro by exporting and importing data.Create a new, empty feature class in ArcGIS Pro.

  4. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World, World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  5. h

    Generate-Distill-Data

    • huggingface.co
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JHU Human Language Technology Center of Excellence (2025). Generate-Distill-Data [Dataset]. https://huggingface.co/datasets/hltcoe/Generate-Distill-Data
    Explore at:
    Dataset updated
    Apr 28, 2025
    Dataset authored and provided by
    JHU Human Language Technology Center of Excellence
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    hltcoe/Generate-Distill-Data dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. r

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated Apr 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg (2019). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    Dataset updated
    Apr 1, 2019
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Publication


    Primahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387

    Description of R codes and data files in the repository

    This repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).

    The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).

    These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt.

    Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.

    Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).

    The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  7. d

    Footprints and producers of source data used to create southern portion of...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Footprints and producers of source data used to create southern portion of the high-resolution (1 m) San Francisco Bay, California, digital elevation model (DEM) [Dataset]. https://catalog.data.gov/dataset/footprints-and-producers-of-source-data-used-to-create-southern-portion-of-the-high-resolu
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    San Francisco Bay, California
    Description

    Polygon shapefile showing the footprint boundaries, source agency origins, and resolutions of compiled bathymetric digital elevation models (DEMs) used to construct a continuous, high-resolution DEM of the southern portion of San Francisco Bay.

  8. C

    Synthetic Integrated Services Data

    • data.wprdc.org
    csv, html, pdf, zip
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allegheny County (2024). Synthetic Integrated Services Data [Dataset]. https://data.wprdc.org/dataset/synthetic-integrated-services-data
    Explore at:
    html, zip(39231637), csv(1375554033), pdfAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    Allegheny County
    Description

    Motivation

    This dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.

    This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.

    Collection

    The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.

    Preprocessing

    Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.

    For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.

    Recommended Uses

    This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.

    Known Limitations/Biases

    Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.

    Feedback

    Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).

    Further Documentation and Resources

    1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
    2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
    3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
    4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.

  9. D

    How to create an Okta Account

    • data.nsw.gov.au
    • researchdata.edu.au
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spatial Services (DCS) (2025). How to create an Okta Account [Dataset]. https://data.nsw.gov.au/data/dataset/1-8584a80239754e66b39a1325f08f5b53
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset provided by
    Spatial Services (DCS)
    Description

    Access API

    Metadata Portal Metadata Information

    Content TitleHow to create an Okta Account
    Content TypeDocument
    DescriptionDocumentation on how to create an Okta Account
    Initial Publication Date09/07/2024
    Data Currency09/07/2024
    Data Update FrequencyOther
    Content SourceData provider files
    File TypeDocument
    Attribution
    Data Theme, Classification or Relationship to other Datasets
    Accuracy
    Spatial Reference System (dataset)Other
    Spatial Reference System (web service)Other
    WGS84 Equivalent ToOther
    Spatial Extent
    Content Lineage
    Data ClassificationUnclassified
    Data Access PolicyOpen
    Data Quality
    Terms and ConditionsCreative Commons
    Standard and Specification
    Data CustodianCustomer Hub
    Point of ContactCustomer Hub
    Data Aggregator
    Data Distributor
    Additional Supporting Information
    TRIM Number

  10. d

    Data from: 10-m backscatter mosaic produced from backscatter intensity data...

    • catalog.data.gov
    • search.dataone.org
    • +3more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). 10-m backscatter mosaic produced from backscatter intensity data from sidescan sonar and multibeam datasets (BS_composite_10m.tif GeoTIFF Image; UTM, Zone 19N, WGS 84) [Dataset]. https://catalog.data.gov/dataset/10-m-backscatter-mosaic-produced-from-backscatter-intensity-data-from-sidescan-sonar-and-m
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    These data are qualitatively derived interpretive polygon shapefiles and selected source raster data defining surficial geology, sediment type and distribution, and physiographic zones of the sea floor from Nahant to Northern Cape Cod Bay. Much of the geophysical data used to create the interpretive layers were collected under a cooperative agreement among the Massachusetts Office of Coastal Zone Management (CZM), the U.S. Geological Survey (USGS), Coastal and Marine Geology Program, the National Oceanic and Atmospheric Administration (NOAA), and the U.S. Army Corps of Engineers (USACE). Initiated in 2003, the primary objective of this program is to develop regional geologic framework information for the management of coastal and marine resources. Accurate data and maps of seafloor geology are important first steps toward protecting fish habitat, delineating marine resources, and assessing environmental changes because of natural or human effects. The project is focused on the inshore waters of coastal Massachusetts. Data collected during the mapping cooperative involving the USGS have been released in a series of USGS Open-File Reports (http://woodshole.er.usgs.gov/project-pages/coastal_mass/html/current_map.html). The interpretations released in this study are for an area extending from the southern tip of Nahant to Northern Cape Cod Bay, Massachusetts. A combination of geophysical and sample data including high resolution bathymetry and lidar, acoustic-backscatter intensity, seismic-reflection profiles, bottom photographs, and sediment samples are used to create the data interpretations. Most of the nearshore geophysical and sample data (including the bottom photographs) were collected during several cruises between 2000 and 2008. More information about the cruises and the data collected can be found at the Geologic Mapping of the Seafloor Offshore of Massachusetts Web page: http://woodshole.er.usgs.gov/project-pages/coastal_mass/.

  11. Small Business Contact Data | North American Small Business Owners |...

    • datarade.ai
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2021). Small Business Contact Data | North American Small Business Owners | Verified Contact Details from 170M Profiles | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/small-business-contact-data-north-american-small-business-o-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Area covered
    Guatemala, Greenland, Saint Pierre and Miquelon, Costa Rica, Honduras, Belize, Panama, Bermuda, Mexico, United States of America
    Description

    Access B2B Contact Data for North American Small Business Owners with Success.ai—your go-to provider for verified, high-quality business datasets. This dataset is tailored for businesses, agencies, and professionals seeking direct access to decision-makers within the small business ecosystem across North America. With over 170 million professional profiles, it’s an unparalleled resource for powering your marketing, sales, and lead generation efforts.

    Key Features of the Dataset:

    Verified Contact Details

    Includes accurate and up-to-date email addresses and phone numbers to ensure you reach your targets reliably.

    AI-validated for 99% accuracy, eliminating errors and reducing wasted efforts.

    Detailed Professional Insights

    Comprehensive data points include job titles, skills, work experience, and education to enable precise segmentation and targeting.

    Enriched with insights into decision-making roles, helping you connect directly with small business owners, CEOs, and other key stakeholders.

    Business-Specific Information

    Covers essential details such as industry, company size, location, and more, enabling you to tailor your campaigns effectively. Ideal for profiling and understanding the unique needs of small businesses.

    Continuously Updated Data

    Our dataset is maintained and updated regularly to ensure relevance and accuracy in fast-changing market conditions. New business contacts are added frequently, helping you stay ahead of the competition.

    Why Choose Success.ai?

    At Success.ai, we understand the critical importance of high-quality data for your business success. Here’s why our dataset stands out:

    Tailored for Small Business Engagement Focused specifically on North American small business owners, this dataset is an invaluable resource for building relationships with SMEs (Small and Medium Enterprises). Whether you’re targeting startups, local businesses, or established small enterprises, our dataset has you covered.

    Comprehensive Coverage Across North America Spanning the United States, Canada, and Mexico, our dataset ensures wide-reaching access to verified small business contacts in the region.

    Categories Tailored to Your Needs Includes highly relevant categories such as Small Business Contact Data, CEO Contact Data, B2B Contact Data, and Email Address Data to match your marketing and sales strategies.

    Customizable and Flexible Choose from a wide range of filtering options to create datasets that meet your exact specifications, including filtering by industry, company size, geographic location, and more.

    Best Price Guaranteed We pride ourselves on offering the most competitive rates without compromising on quality. When you partner with Success.ai, you receive superior data at the best value.

    Seamless Integration Delivered in formats that integrate effortlessly with your CRM, marketing automation, or sales platforms, so you can start acting on the data immediately.

    Use Cases: This dataset empowers you to:

    Drive Sales Growth: Build and refine your sales pipeline by connecting directly with decision-makers in small businesses. Optimize Marketing Campaigns: Launch highly targeted email and phone outreach campaigns with verified contact data. Expand Your Network: Leverage the dataset to build relationships with small business owners and other key figures within the B2B landscape. Improve Data Accuracy: Enhance your existing databases with verified, enriched contact information, reducing bounce rates and increasing ROI. Industries Served: Whether you're in B2B SaaS, digital marketing, consulting, or any field requiring accurate and targeted contact data, this dataset serves industries of all kinds. It is especially useful for professionals focused on:

    Lead Generation Business Development Market Research Sales Outreach Customer Acquisition What’s Included in the Dataset: Each profile provides:

    Full Name Verified Email Address Phone Number (where available) Job Title Company Name Industry Company Size Location Skills and Professional Experience Education Background With over 170 million profiles, you can tap into a wealth of opportunities to expand your reach and grow your business.

    Why High-Quality Contact Data Matters: Accurate, verified contact data is the foundation of any successful B2B strategy. Reaching small business owners and decision-makers directly ensures your message lands where it matters most, reducing costs and improving the effectiveness of your campaigns. By choosing Success.ai, you ensure that every contact in your pipeline is a genuine opportunity.

    Partner with Success.ai for Better Data, Better Results: Success.ai is committed to delivering premium-quality B2B data solutions at scale. With our small business owner dataset, you can unlock the potential of North America's dynamic small business market.

    Get Started Today Request a sample or customize your dataset to fit your unique...

  12. f

    Data from: Database Creator for Mass Analysis of Peptides and Proteins,...

    • figshare.com
    • acs.figshare.com
    txt
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pandi Boomathi Pandeswari; Arnold Emerson Isaac; Varatharajan Sabareesh (2023). Database Creator for Mass Analysis of Peptides and Proteins, DC-MAPP: A Standalone Tool for Simplifying Manual Analysis of Mass Spectral Data to Identify Peptide/Protein Sequences [Dataset]. http://doi.org/10.1021/jasms.3c00030.s005
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Pandi Boomathi Pandeswari; Arnold Emerson Isaac; Varatharajan Sabareesh
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Proteomic studies typically involve the use of different types of software for annotating experimental tandem mass spectrometric data (MS/MS) and thereby simplifying the process of peptide and protein identification. For such annotations, these softwares calculate the m/z values of the peptide/protein precursor and fragment ions, for which a database of protein sequences must be provided as an input file. The calculated m/z values are stored as another database, which the user usually cannot view. Database Creator for Mass Analysis of Peptides and Proteins (DC-MAPP) is a novel standalone software that can create custom databases for “viewing” the calculated m/z values of precursor and fragment ions, prior to the database search. It contains three modules. Peptide/Protein sequences as per user’s choice can be entered as input to the first module for creating a custom database. In the second module, m/z values must be queried-in, which are searched within the custom database to identify protein/peptide sequences. The third module is suited for peptide mass fingerprinting, which can be used to analyze both ESI and MALDI mass spectral data. The feature of “viewing” the custom database can be helpful not only for better understanding the search engine processes, but also for designing multiple reaction monitoring (MRM) methods. Post-translational modifications and protein isoforms can also be analyzed. Since, DC-MAPP relies on the protein/peptide “sequences” for creating custom databases, it may not be applicable for the searches involving spectral libraries. Python language was used for implementation, and the graphical user interface was built with Page/Tcl, making this tool more user-friendly. It is freely available at https://vit.ac.in/DC-MAPP/.

  13. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    • explore.openaire.eu
    bz2
    Updated Mar 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.11
    Python 3.7.2
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-03-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.7:

    conda create -n analyses python=3.7
    conda activate analyses

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • Index.ipynb
    • N0.Repository.ipynb
    • N1.Skip.Notebook.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.Repository.With.Notebook.Restriction.ipynb
    • N12.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    <code

  14. 130k Images (128x128) - Universal Image Embeddings

    • kaggle.com
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit singh (2022). 130k Images (128x128) - Universal Image Embeddings [Dataset]. https://www.kaggle.com/datasets/rhtsingh/google-universal-image-embeddings-128x128/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    Kaggle
    Authors
    Rohit singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction This is my scraped, collected, and curated dataset for the Google Universal Image Embedding competition resized to 128x128. It contains 130k+ images in total and below provides a count for each class -

    Data Count | apparel | artwork | cars | dishes | furniture | illustrations | landmark | meme | packaged | storefronts | toys | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 32,226| 4,957 | 8,144 | 5,831 | 10,488 | 3,347 | 33,063 | 3,301 | 23,382 | 5,387 | 2,402 |

    Data Source 1. Apparel - Deep Fashion Dataset 2. Artwork - Google Scrapped 3. Cars - Stanford Cars Dataset 4. Dishes - Google Scrapped 5. Furniture - Google Scrapped 6. Illustrations - Google Scrapped 7. Landmark - Google Landmark Dataset 8. Meme - Google Scrapped 9. Packaged - Holosecta, Grozi 3.2k, Freiburg Groceries, SKU110K 10. Storefronts - Google Scrapped 11. Toys - Google Scrapped

  15. Data Modeling Software Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Data Modeling Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-modeling-software-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Modeling Software Market Outlook



    The global data modeling software market size was valued at approximately USD 2.5 billion in 2023 and is projected to reach around USD 6.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.5% from 2024 to 2032. The market's robust growth can be attributed to the increasing adoption of data-driven decision-making processes across various industries, which necessitates advanced data modeling solutions to manage and analyze large volumes of data efficiently.



    The proliferation of big data and the growing need for data governance are significant drivers for the data modeling software market. Organizations are increasingly recognizing the importance of structured and unstructured data in generating valuable insights. With data volumes exploding, data modeling software becomes essential for creating logical data models that represent business processes and information requirements accurately. This software is crucial for implementation in data warehouses, analytics, and business intelligence applications, further fueling market growth.



    Technological advancements, particularly in artificial intelligence (AI) and machine learning (ML), are also propelling the data modeling software market forward. These technologies enable more sophisticated data models that can predict trends, optimize operations, and enhance decision-making processes. The integration of AI and ML with data modeling tools allows for automated data analysis, reducing the time and effort required for manual processes and improving the accuracy of the results. This technological synergy is a significant growth factor for the market.



    The rise of cloud-based solutions is another critical factor contributing to the market's expansion. Cloud deployment offers numerous advantages, such as scalability, flexibility, and cost-effectiveness, making it an attractive option for businesses of all sizes. Cloud-based data modeling software allows for real-time collaboration and access to data from anywhere, enhancing productivity and efficiency. As more companies move their operations to the cloud, the demand for cloud-compatible data modeling solutions is expected to surge, driving market growth further.



    In terms of regional outlook, North America currently holds the largest share of the data modeling software market. This dominance is due to the high concentration of technology-driven enterprises and a strong emphasis on data analytics and business intelligence in the region. However, the Asia Pacific region is anticipated to witness the highest growth rate during the forecast period. Rapid digital transformation, increased cloud adoption, and the rising importance of data analytics in emerging economies like China and India are key factors contributing to this growth. Europe, Latin America, and the Middle East & Africa also present significant opportunities, albeit at varying growth rates.



    Component Analysis



    In the data modeling software market, the component segment is divided into software and services. The software component is the most significant contributor to the market, driven by the increasing need for advanced data modeling tools that can handle complex data structures and provide accurate insights. Data modeling software includes various tools and platforms that facilitate the creation, management, and optimization of data models. These tools are essential for database design, data architecture, and other data management tasks, making them indispensable for organizations aiming to leverage their data assets effectively.



    Within the software segment, there is a growing trend towards integrating AI and ML capabilities to enhance the functionality of data modeling tools. This integration allows for more sophisticated data analysis, automated model generation, and improved accuracy in predictions and insights. As a result, organizations can achieve better data governance, streamline operations, and make more informed decisions. The demand for such advanced software solutions is expected to rise, contributing significantly to the market's growth.



    The services component, although smaller in comparison to the software segment, plays a crucial role in the data modeling software market. Services include consulting, implementation, training, and support, which are essential for the successful deployment and utilization of data modeling tools. Many organizations lack the in-house expertise to effectively implement and manage data modeling software, leading to increased demand for professional services.

  16. AI Training Dataset Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). AI Training Dataset Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-ai-training-dataset-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI Training Dataset Market Outlook



    The global AI training dataset market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 6.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 20.5% from 2024 to 2032. This substantial growth is driven by the increasing adoption of artificial intelligence across various industries, the necessity for large-scale and high-quality datasets to train AI models, and the ongoing advancements in AI and machine learning technologies.



    One of the primary growth factors in the AI training dataset market is the exponential increase in data generation across multiple sectors. With the proliferation of internet usage, the expansion of IoT devices, and the digitalization of industries, there is an unprecedented volume of data being generated daily. This data is invaluable for training AI models, enabling them to learn and make more accurate predictions and decisions. Moreover, the need for diverse and comprehensive datasets to improve AI accuracy and reliability is further propelling market growth.



    Another significant factor driving the market is the rising investment in AI and machine learning by both public and private sectors. Governments around the world are recognizing the potential of AI to transform economies and improve public services, leading to increased funding for AI research and development. Simultaneously, private enterprises are investing heavily in AI technologies to gain a competitive edge, enhance operational efficiency, and innovate new products and services. These investments necessitate high-quality training datasets, thereby boosting the market.



    The proliferation of AI applications in various industries, such as healthcare, automotive, retail, and finance, is also a major contributor to the growth of the AI training dataset market. In healthcare, AI is being used for predictive analytics, personalized medicine, and diagnostic automation, all of which require extensive datasets for training. The automotive industry leverages AI for autonomous driving and vehicle safety systems, while the retail sector uses AI for personalized shopping experiences and inventory management. In finance, AI assists in fraud detection and risk management. The diverse applications across these sectors underline the critical need for robust AI training datasets.



    As the demand for AI applications continues to grow, the role of Ai Data Resource Service becomes increasingly vital. These services provide the necessary infrastructure and tools to manage, curate, and distribute datasets efficiently. By leveraging Ai Data Resource Service, organizations can ensure that their AI models are trained on high-quality and relevant data, which is crucial for achieving accurate and reliable outcomes. The service acts as a bridge between raw data and AI applications, streamlining the process of data acquisition, annotation, and validation. This not only enhances the performance of AI systems but also accelerates the development cycle, enabling faster deployment of AI-driven solutions across various sectors.



    Regionally, North America currently dominates the AI training dataset market due to the presence of major technology companies and extensive R&D activities in the region. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid technological advancements, increasing investments in AI, and the growing adoption of AI technologies across various industries in countries like China, India, and Japan. Europe and Latin America are also anticipated to experience significant growth, supported by favorable government policies and the increasing use of AI in various sectors.



    Data Type Analysis



    The data type segment of the AI training dataset market encompasses text, image, audio, video, and others. Each data type plays a crucial role in training different types of AI models, and the demand for specific data types varies based on the application. Text data is extensively used in natural language processing (NLP) applications such as chatbots, sentiment analysis, and language translation. As the use of NLP is becoming more widespread, the demand for high-quality text datasets is continually rising. Companies are investing in curated text datasets that encompass diverse languages and dialects to improve the accuracy and efficiency of NLP models.



    Image data is critical for computer vision application

  17. Lending Club Loan Data Analysis - Deep Learning

    • kaggle.com
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deependra Verma (2023). Lending Club Loan Data Analysis - Deep Learning [Dataset]. https://www.kaggle.com/datasets/deependraverma13/lending-club-loan-data-analysis-deep-learning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Deependra Verma
    Description

    DESCRIPTION

    Create a model that predicts whether or not a loan will be default using the historical data.

    Problem Statement:

    For companies like Lending Club correctly predicting whether or not a loan will be a default is very important. In this project, using the historical data from 2007 to 2015, you have to build a deep learning model to predict the chance of default for future loans. As you will see later this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.

    Domain: Finance

    Analysis to be done: Perform data preprocessing and build a deep learning prediction model.

    Content:

    Dataset columns and definition:

    credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.

    purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other").

    int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.

    installment: The monthly installments owed by the borrower if the loan is funded.

    log.annual.inc: The natural log of the self-reported annual income of the borrower.

    dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).

    fico: The FICO credit score of the borrower.

    days.with.cr.line: The number of days the borrower has had a credit line.

    revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).

    revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).

    inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months.

    delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.

    pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).

    Steps to perform:

    Perform exploratory data analysis and feature engineering and then apply feature engineering. Follow up with a deep learning model to predict whether or not the loan will be default using the historical data.

    Tasks:

    1. Feature Transformation

    Transform categorical values into numerical values (discrete)

    1. Exploratory data analysis of different factors of the dataset.

    2. Additional Feature Engineering

    You will check the correlation between features and will drop those features which have a strong correlation

    This will help reduce the number of features and will leave you with the most relevant features

    1. Modeling

    After applying EDA and feature engineering, you are now ready to build the predictive models

    In this part, you will create a deep learning model using Keras with Tensorflow backend

  18. Data Science Platform Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Data Science Platform Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-science-platform-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Science Platform Market Outlook



    The global data science platform market size was valued at approximately USD 49.3 billion in 2023 and is projected to reach USD 174.4 billion by 2032, growing at a compound annual growth rate (CAGR) of 15.1% during the forecast period. This exponential growth can be attributed to the increasing demand for data-driven decision-making processes, the surge in big data technologies, and the need for more advanced analytics solutions across various industries.



    One of the primary growth factors driving the data science platform market is the rapid digital transformation efforts undertaken by organizations globally. Companies are shifting towards data-centric business models to gain a competitive edge, improve operational efficiency, and enhance customer experiences. The proliferation of IoT devices and the subsequent explosion of data generated have further propelled the need for sophisticated data science platforms capable of analyzing vast datasets in real-time. This transformation is not only seen in large enterprises but also increasingly in small and medium enterprises (SMEs) that recognize the potential of data analytics in driving business growth.



    Moreover, the advancements in artificial intelligence (AI) and machine learning (ML) technologies have significantly augmented the capabilities of data science platforms. These technologies enable the automation of complex data analysis processes, allowing for more accurate predictions and insights. As a result, sectors such as healthcare, finance, and retail are increasingly adopting data science solutions to leverage AI and ML for personalized services, fraud detection, and supply chain optimization. The integration of AI/ML into data science platforms is thus a critical factor contributing to market growth.



    Another crucial factor is the growing regulatory and compliance requirements across various industries. Organizations are mandated to ensure data accuracy, security, and privacy, necessitating the adoption of robust data science platforms that can handle these aspects efficiently. The implementation of regulations such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States has compelled organizations to invest in advanced data management and analytics solutions. These regulatory frameworks are not only a challenge but also an opportunity for the data science platform market to innovate and provide compliant solutions.



    Regionally, North America dominates the data science platform market due to the early adoption of advanced technologies, a strong presence of key market players, and significant investments in research and development. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. This growth can be attributed to the increasing digitalization initiatives, a growing number of tech startups, and the rising demand for analytics solutions in countries like China, India, and Japan. The competitive landscape and economic development in these regions are creating ample opportunities for market expansion.



    Component Analysis



    The data science platform market, segmented by components, includes platforms and services. The platform segment encompasses software and tools designed for data integration, preparation, and analysis, while the services segment covers professional and managed services that support the implementation and maintenance of these platforms. The platform component is crucial as it provides the backbone for data science operations, enabling data scientists to perform data wrangling, model building, and deployment efficiently. The increasing demand for customized solutions tailored to specific business needs is driving the growth of the platform segment. Additionally, with the rise of open-source platforms, organizations have more flexibility and control over their data science workflows, further propelling this segment.



    On the other hand, the services segment is equally vital as it ensures that organizations can effectively deploy and utilize data science platforms. Professional services include consulting, training, and support, which help organizations in the seamless integration of data science solutions into their existing IT infrastructure. Managed services provide ongoing support and maintenance, ensuring data science platforms operate optimally. The rising complexity of data ecosystems and the shortage of skilled data scientists are factors contributing to the growth of the services segment, as organizations often rely on external expert

  19. Data from: Create a Story Map

    • hubexamples-dcdev.hub.arcgis.com
    Updated Jul 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ESRI R&D Center (2018). Create a Story Map [Dataset]. https://hubexamples-dcdev.hub.arcgis.com/datasets/create-a-story-map
    Explore at:
    Dataset updated
    Jul 16, 2018
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    ESRI R&D Center
    Description

    DO NOT DELETE OR MODIFY THIS ITEM. This item is managed by the ArcGIS Hub application. Create your own initiative by combining existing applications with a custom site.

  20. U

    Polygon shapefile of data sources used to create a composite multibeam...

    • data.usgs.gov
    • catalog.data.gov
    Updated Aug 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Dartnell; James Conrad; Janet Watt; Jenna Hill (2021). Polygon shapefile of data sources used to create a composite multibeam bathymetry surface of the southern Cascadia Margin offshore Oregon and northern California [Dataset]. http://doi.org/10.5066/P9C5DBMR
    Explore at:
    Dataset updated
    Aug 23, 2021
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Peter Dartnell; James Conrad; Janet Watt; Jenna Hill
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1996 - 2019
    Area covered
    Northern California, Oregon, California
    Description

    This polygon shapefile describes the data sources used to create a composite 30-m resolution multibeam bathymetry surface of southern Cascadia Margin offshore Oregon and northern California.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jo Yen Tan; Jo Yen Tan (2022). VegeNet - Image datasets and Codes [Dataset]. http://doi.org/10.5281/zenodo.7254508
Organization logo

VegeNet - Image datasets and Codes

Explore at:
zipAvailable download formats
Dataset updated
Oct 27, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jo Yen Tan; Jo Yen Tan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Compilation of python codes for data preprocessing and VegeNet building, as well as image datasets (zip files).

Image datasets:

  1. vege_original : Images of vegetables captured manually in data acquisition stage
  2. vege_cropped_renamed : Images in (1) cropped to remove background areas and image labels renamed
  3. non-vege images : Images of non-vegetable foods for CNN network to recognize other-than-vegetable foods
  4. food_image_dataset : Complete set of vege (2) and non-vege (3) images for architecture building.
  5. food_image_dataset_split : Image dataset (4) split into train and test sets
  6. process : Images created when cropping (pre-processing step) to create dataset (2).
Search
Clear search
Close search
Google apps
Main menu