95 datasets found
  1. B

    Big Data Intelligence Engine Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Big Data Intelligence Engine Report [Dataset]. https://www.datainsightsmarket.com/reports/big-data-intelligence-engine-1991939
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data Intelligence Engine market is experiencing robust growth, driven by the increasing need for advanced analytics across diverse sectors. The market's expansion is fueled by several key factors: the exponential growth of data volume from various sources (IoT devices, social media, etc.), the rising adoption of cloud computing for data storage and processing, and the increasing demand for real-time insights to support faster and more informed decision-making. Applications spanning data mining, machine learning, and artificial intelligence are significantly contributing to this market expansion. Furthermore, the rising adoption of programming languages like Java, Python, and Scala, which are well-suited for big data processing, is further fueling market growth. Technological advancements, such as the development of more efficient and scalable algorithms and the emergence of specialized hardware like GPUs, are also playing a crucial role. While data security and privacy concerns, along with the high initial investment costs associated with implementing Big Data Intelligence Engine solutions, pose some restraints, the overall market outlook remains extremely positive. The competitive landscape is dominated by a mix of established technology giants like IBM, Microsoft, Google, and Amazon, and emerging players such as Alibaba Cloud, Tencent Cloud, and Baidu Cloud. These companies are aggressively investing in research and development to enhance their offerings and expand their market share. The market is geographically diverse, with North America and Europe currently holding significant market shares. However, the Asia-Pacific region, particularly China and India, is expected to witness the fastest growth in the coming years due to increasing digitalization and government initiatives promoting technological advancements. This growth is further segmented by application (Data Mining, Machine Learning, AI) and programming languages (Java, Python, Scala), offering opportunities for specialized solutions and services. The forecast period of 2025-2033 promises substantial growth, driven by continued innovation and widespread adoption across industries.

  2. f

    Large data files for 3011979 Python demo

    • datasetcatalog.nlm.nih.gov
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sriswasdi, Sira (2023). Large data files for 3011979 Python demo [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001018698
    Explore at:
    Dataset updated
    Dec 1, 2023
    Authors
    Sriswasdi, Sira
    Description

    These are demo data files used to teach machine learning with Python in 3011979 course at Chulalongkorn University in Spring 2021 and Spring 2022

  3. m

    Student Skill Gap Analysis

    • data.mendeley.com
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bindu Garg (2025). Student Skill Gap Analysis [Dataset]. http://doi.org/10.17632/rv6scbpd7v.1
    Explore at:
    Dataset updated
    Apr 28, 2025
    Authors
    Bindu Garg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is designed for skill gap analysis, focusing on evaluating the skill gap between students’ current skills and industry requirements. It provides insights into technical skills, soft skills, career interests, and challenges, helping in skill gap analysis to identify areas for improvement.

    By leveraging this dataset, educators, recruiters, and researchers can conduct skill gap analysis to assess students’ job readiness and tailor training programs accordingly. It serves as a valuable resource for identifying skill deficiencies and skill gaps improving career guidance, and enhancing curriculum design through targeted skill gap analysis.

    Following is the column descriptors: Name - Student's full name. email_id - Student's email address. Year - The academic year the student is currently in (e.g., 1st Year, 2nd Year, etc.). Current Course - The course the student is currently pursuing (e.g., B.Tech CSE, MBA, etc.). Technical Skills - List of technical skills possessed by the student (e.g., Python, Data Analysis, Cloud Computing). Programming Languages - Programming languages known by the student (e.g., Python, Java, C++). Rating - Self-assessed rating of technical skills on a scale of 1 to 5. Soft Skills - List of soft skills (e.g., Communication, Leadership, Teamwork). Rating - Self-assessed rating of soft skills on a scale of 1 to 5. Projects - Indicates whether the student has worked on any projects (Yes/No). Career Interest - The student's preferred career path (e.g., Data Scientist, Software Engineer). Challenges - Challenges faced while applying for jobs/internships (e.g., Lack of experience, Resume building issues).

  4. Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

    • zenodo.org
    Updated May 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ekaterina Trofimova; Ekaterina Trofimova; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy (2024). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.11213783
    Explore at:
    Dataset updated
    May 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ekaterina Trofimova; Ekaterina Trofimova; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy; Emil Sataev; Anastasia Drozdova; Polina Guseva; Anna Scherbakova; Andrey Ustyuzhanin; Anastasia Gorodilova; Valeriy Berezovskiy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an enriched version of Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle. The initial corpus consists of ≈ 2.5 million snippets of ML code collected from ≈ 100 thousand Jupyter notebooks. A representative fraction of the snippets is annotated by human assessors through a user-friendly interface specially designed for that purpose.

    The data is organized as a set of tables in CSV format. It includes several central entities: raw code blocks collected from Kaggle (code_blocks.csv), kernels (kernels_meta.csv) and competitions meta information (competitions_meta.csv). Manually annotated code blocks are presented as a separate table (murkup_data.csv). As this table contains the numeric id of the code block semantic type, we also provide a mapping from the id to semantic class and subclass (vertices.csv).

    Snippets information (code_blocks.csv) can be mapped with kernels meta-data via kernel_id. Kernels metadata is linked to Kaggle competitions information through comp_name. To ensure the quality of the data kernels_meta.csv includes only notebooks with an available Kaggle score.

    Automatic classification of code_blocks are stored in data_with_preds.csv. The mapping of this table with code_blocks.csv can be doe through code_blocks_index column, which corresponds to code_blocks indices.

    The updated Code4ML 2.0 corpus includes kernels retrieved from Code Kaggle Meta. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

    kernels_meta2.csv may contain kernels without Kaggle score, but with the place in the leader board (rank).

    Code4ML 2.0 dataset can be used for various purposes, including training and evaluating models for code generation, code understanding, and natural language processing tasks.

  5. m

    Python code for the estimation of missing prices in real-estate market with...

    • data.mendeley.com
    Updated Dec 12, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iván García-Magariño (2017). Python code for the estimation of missing prices in real-estate market with a dataset of house prices from Teruel city [Dataset]. http://doi.org/10.17632/mxpgf54czz.2
    Explore at:
    Dataset updated
    Dec 12, 2017
    Authors
    Iván García-Magariño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Teruel
    Description

    This research data file contains the necessary software and the dataset for estimating the missing prices of house units. This approach combines several machine learning techniques (linear regression, support vector regression, the k-nearest neighbors and a multi-layer perceptron neural network) with several dimensionality reduction techniques (non-negative factorization, recursive feature elimination and feature selection with a variance threshold). It includes the input dataset formed with the available house prices in two neighborhoods of Teruel city (Spain) in November 13, 2017 from Idealista website. These two neighborhoods are the center of the city and “Ensanche”.

    This dataset supports the research of the authors in the improvement of the setup of agent-based simulations about real-estate market. The work about this dataset has been submitted for consideration for publication to a scientific journal.

    The open source python code is composed of all the files with the “.py” extension. The main program can be executed from the “main.py” file. The “boxplotErrors.eps” is a chart generated from the execution of the code, and compares the results of the different combinations of machine learning techniques and dimensionality reduction methods.

    The dataset is in the “data” folder. The input raw data of the house prices are in the “dataRaw.csv” file. These were shuffled into the “dataShuffled.csv” file. We used cross-validation to obtain the estimations of house prices. The outputted estimations alongside the real values are stored in different files of the “data” folder, in which each filename is composed by the machine learning technique abbreviation and the dimensionality reduction method abbreviation.

  6. f

    Evaluation of future trends of scientific research

    • stemfellowship.figshare.com
    png
    Updated Jan 30, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlie Sun; Kerry Li; Zhenyu Li (2017). Evaluation of future trends of scientific research [Dataset]. http://doi.org/10.6084/m9.figshare.4595452.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jan 30, 2017
    Dataset provided by
    STEM Fellowship Big Data Challenge
    Authors
    Charlie Sun; Kerry Li; Zhenyu Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The rising trend of scientific researches have led more people to pay their attention towards scientific researches, but simply the word "scientific research" does not explain the whole nature of itself, like any other things in reality, it is divided into many realms. The various fields of scientific research have already been discussed by many scholarly articles and have been evaluated by previous census and researches. However, the ultimate question remains unanswered, namely, what is the most popular field of scientific research and which one will become the focus in the future. Although the number of specific fields that can be derived is too vast to be counted, numerous major fields can be identified to categorize the various fields, such as astronomy, engineering, computer science, medicine, biology and chemistry. Several main factors are related to the popularity, such as the number of articles relating to respective fields, number of posts on social media and the number of views on professional sites. A program was developed to analyze the relationship between the subjects for scientific research and the future trend of them based on the number of mentions for each field of research, scholarly articles and quotations about them. The program uses the data from Altmetric data, an authoritative data source. SAS is used to analyze the data and put the data on several graphs that represent the value for each factor. Finally, suggestions for future scientific researches can be summarized and inferred from the result of this research, which is aimed to provide enlightenment for future research directions.Fig 1 - The functions used in this research.Fig 2 - The main Python program used in this research.Fig 3 - The structure of output.Fig 4 - Factor 1: Number of articles relating to each field.Fig 5 - Factor 2: Number of views on Mendeley, Connotea, and Citeulike.Fig 6 - Factor 3: Number of posts on Facebook and Twitter.Fig 7 - The correlation between individual factors.

  7. H

    Using Python Packages and HydroShare to Advance Open Data Science and...

    • hydroshare.org
    • beta.hydroshare.org
    zip
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water [Dataset]. https://www.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb
    Explore at:
    zip(31.0 MB)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    HydroShare
    Authors
    Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

    This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.

  8. h

    Python-DPO-Large

    • huggingface.co
    Updated Mar 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NextWealth Entrepreneurs Private Limited (2023). Python-DPO-Large [Dataset]. https://huggingface.co/datasets/NextWealth/Python-DPO-Large
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2023
    Dataset authored and provided by
    NextWealth Entrepreneurs Private Limited
    Description

    Dataset Card for Python-DPO

    This dataset is the larger version of Python-DPO dataset and has been created using Argilla.

      Load with datasets
    

    To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset

    ds = load_dataset("NextWealth/Python-DPO")

      Data Fields
    

    Each data instance contains:

    instruction: The problem description/requirements chosen_code:… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO-Large.

  9. Zegami user manual for data exploration: "Systematic analysis of YFP gene...

    • zenodo.org
    pdf, zip
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor (2024). Zegami user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.6374012
    Explore at:
    pdf, zipAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The explosion in biological data generation challenges the available technologies and methodologies for data interrogation. Moreover, highly rich and complex datasets together with diverse linked data are difficult to explore when provided in flat files. Here we provide a way to filter and analyse in a systematic way a dataset with more than 18 thousand data points using Zegami, a solution for interactive data visualisation and exploration. The primary data we use are derived from a systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system which is submitted elsewhere. This manual provides the raw image data together with annotations and associated data and explains how to use Zegami for exploring all these data types together by providing specific examples. We also provide the open source python code used to annotate the figures.

  10. Data from: ManyTypes4Py: A benchmark Python Dataset for Machine...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Aug 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir M. Mir; Amir M. Mir; Evaldas Latoskinas; Georgios Gousios; Evaldas Latoskinas; Georgios Gousios (2021). ManyTypes4Py: A benchmark Python Dataset for Machine Learning-Based Type Inference [Dataset]. http://doi.org/10.5281/zenodo.4571228
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amir M. Mir; Amir M. Mir; Evaldas Latoskinas; Georgios Gousios; Evaldas Latoskinas; Georgios Gousios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • The dataset is gathered on Sep. 17th 2020. It has more than 5.4K Python repositories that are hosted on GitHub. Check out the file ManyTypes4PyDataset.spec for repositories URL and their commit SHA.
    • The dataset is also de-duplicated using the CD4Py tool. The list of duplicate files is provided in duplicate_files.txt file.
    • All of its Python projects are processed in JSON-formatted files. They contain a seq2seq representation of each file, type-related hints, and information for machine learning models. The structure of JSON-formatted files is described in JSONOutput.md file.
    • The dataset is split into train, validation and test sets by source code files. The list of files and their corresponding set is provided in dataset_split.csv file.
    • Notable changes to each version of the dataset are documented in CHANGELOG.md.
  11. Big Data Certification KR

    • kaggle.com
    zip
    Updated Nov 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KIM TAE HEON (2021). Big Data Certification KR [Dataset]. https://www.kaggle.com/agileteam/bigdatacertificationkr
    Explore at:
    zip(15840 bytes)Available download formats
    Dataset updated
    Nov 29, 2021
    Authors
    KIM TAE HEON
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    빅데이터 분석기사 실기 준비 놀이터

    함께 놀아볼까요? 무궁화 꽃이 피었습니다 😜 빅데이터 분석기사 실기 준비를 위한 데이터 셋입니다. 더 좋은 코드를 만든다면 많은 공유 부탁드려요🎉 (Python과 R모두 환영합니다.)

    4회 기출 유형

    3회 기출 유형 및 심화 학습자료

    🆕 New 문제 업데이트 2022.6

    🎁 빅데이터 분식기사 실기 입문 강의 Open 🎁

    • https://class101.page.link/tp9k
    • 입문자를 위한 강의 오픈 했어요 👍
    • 파이썬-판다스-머신러닝-모의문제(작업형1,2)-꿀팁 등을 실기 준비에 필요한 내용만 친절하게 알려드려요🎉
    • 머신러닝을 해보신 분이라면 수강 할 필요 없을 것 같아요, 바로 모의 문제를 풀기 힘든 설명이 필요한 찐 입문자에게 추천드려요!

    📌작업형1 예상문제 (P:파이썬, R)

    Tasks 탭에서 문제 및 코드 확인

    📌작업형2 예상문제

    Tasks 탭에서 문제 및 코드 확인 - [3회차 기출유형 작업형2] : 여행 보험 패키지 상품 (데이터를 조금 어렵게 변경함) P: https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline

    📌6 주 완성 코스 (아래 표 참고)

    주차유형(에디터)번호
    6주 전작업형1(노트북)T1-1~5
    5주 전작업형1(노트북)T1-6~9, T1 EQ(기출),
    4주 전작업형1(스크립트), 작업형2(노트북)T1-10~13, T1.Ex, T2EQ, T2-1
    3주 전작업형1(스크립트), 작업형2(노트북)T1-14~19, T2-2~3
    2주 전작업형1(스크립트), 작업형2(스크립트)T1-20~21, T2-4~6, 복습
    1주 전작업형1, 작업형2(스크립트), 단답형T1-22~24, 모의고사, 복습, 응시환경 체험, 단답

    📌입문자를 위한 머신러닝 튜토리얼 (공유해주신 노트북 중 선정하였음👍)

    - https://www.kaggle.com/ohseokkim/t2-2-pima-indians-diabetes 작성자: @ohseokkim 😆

  12. d

    (HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

    • search.dataone.org
    • hydroshare.org
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Hydroshare
    Authors
    Young-Don Choi
    Description

    We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

  13. m

    Data from: Generating Heterogeneous Big Data Set for Healthcare and...

    • data.mendeley.com
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Al-Obidi (2023). Generating Heterogeneous Big Data Set for Healthcare and Telemedicine Research Based on ECG, Spo2, Blood Pressure Sensors, and Text Inputs: Data set classified, Analyzed, Organized, And Presented in Excel File Format. [Dataset]. http://doi.org/10.17632/gsmjh55sfy.1
    Explore at:
    Dataset updated
    Jan 23, 2023
    Authors
    Omar Al-Obidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.

  14. P

    Python Package Software Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Python Package Software Report [Dataset]. https://www.marketresearchforecast.com/reports/python-package-software-59302
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 26, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Python Package Software market is experiencing robust growth, driven by the increasing adoption of Python in various industries and the rising demand for efficient and specialized software solutions. The market's expansion is fueled by the large and active Python community constantly developing and refining packages for diverse applications, from web development and data science to machine learning and automation. While precise market sizing is unavailable, considering the widespread use of Python and the significant contribution of open-source packages, a reasonable estimate for the 2025 market size could be around $5 billion, projecting a Compound Annual Growth Rate (CAGR) of 15% over the forecast period (2025-2033). This growth is primarily driven by the increasing complexity of software projects demanding specialized functionality readily available through packages, the need for faster development cycles, and the cost-effectiveness of leveraging pre-built components. Key trends include the rise of cloud-based Python package management, the growing importance of security and maintainability in package selection, and the increasing specialization of packages for niche applications. Constraints on market growth might include challenges in ensuring package quality and security, as well as the learning curve associated with integrating and managing diverse packages within large projects. The market is segmented into cloud-based and web-based solutions, catering to large enterprises and SMEs, with North America and Europe currently holding the largest market shares. The diverse range of packages, from those focusing on data manipulation (Pandas, NumPy) and web frameworks (Django, Flask) to machine learning libraries (Scikit-learn, TensorFlow) and GUI development (Tkinter, PyQt), underscores the market's versatility. The significant contribution of open-source packages fosters a collaborative environment and continuous improvement. However, challenges remain in effectively managing the vast ecosystem of packages, addressing security vulnerabilities, and ensuring interoperability. The future growth will hinge on addressing these challenges, fostering standardization, and further improving the accessibility and user experience of Python package management systems. Continued innovation within the Python ecosystem and broader industry trends such as the rise of AI and big data will further propel the market's expansion.

  15. Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7738944
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV: https://mdv.molbiol.ox.ac.uk), a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system (https://doi.org/10.1083/jcb.202205129). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.

  16. f

    Comparison of the Predictive Performance and Interpretability of Random...

    • acs.figshare.com
    • figshare.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley (2023). Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets [Dataset]. http://doi.org/10.1021/acs.jcim.6b00753.s006
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    ACS Publications
    Authors
    Richard L. Marchese Robinson; Anna Palczewska; Jan Palczewski; Nathan Kidley
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The ability to interpret the predictions made by quantitative structure–activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package (https://r-forge.r-project.org/R/?group_id=1725) for the R statistical programming language and the Python program HeatMapWrapper [https://doi.org/10.5281/zenodo.495163] for heat map generation.

  17. o

    Data from: ManyTypes4Py: A benchmark Python Dataset for Machine...

    • explore.openaire.eu
    • data.europa.eu
    Updated Sep 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir M. Mir; Evaldas Latoskinas; Georgios Gousios (2020). ManyTypes4Py: A benchmark Python Dataset for Machine Learning-Based Type Inference [Dataset]. http://doi.org/10.5281/zenodo.4601051
    Explore at:
    Dataset updated
    Sep 22, 2020
    Authors
    Amir M. Mir; Evaldas Latoskinas; Georgios Gousios
    Description

    The dataset is gathered on Sep. 17th 2020 from GitHub. It has clean and complete versions (from v0.7): The clean version has 5.1K type-checked Python repositories and 1.2M type annotations. The complete version has 5.2K Python repositories and 3.3M type annotations. The dataset's source files are type-checked using mypy (clean version). The dataset is also de-duplicated using the CD4Py tool. Check out the README.MD file for the description of the dataset. Notable changes to each version of the dataset are documented in CHANGELOG.md. The dataset's scripts and utilities are available on its GitHub repository. {"references": ["A. Mir, E. Latoskinas and G. Gousios, "ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-Based Type Inference," in 2021 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 2021 pp. 585-589. doi: 10.1109/MSR52588.2021.00079"]}

  18. m

    A dataset for conduction heat transer and deep learning

    • data.mendeley.com
    Updated Jun 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Edalatifar (2020). A dataset for conduction heat transer and deep learning [Dataset]. http://doi.org/10.17632/rw9yk3c559.1
    Explore at:
    Dataset updated
    Jun 25, 2020
    Authors
    Mohammad Edalatifar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Big data images for conduction heat transfer The related paper has been published here: M. Edalatifar, M.B. Tavakoli, M. Ghalambaz, F. Setoudeh, Using deep learning to learn physics of conduction heat transfer, Journal of Thermal Analysis and Calorimetry; 2020. https://doi.org/10.1007/s10973-020-09875-6 Steps to reproduce: The dataset is saved in two format, .npz for python and .mat for matlab. *.mat has large size, then it is compressed with winzip. ReadDataset_Python.py and ReadDataset_Matlab.m are examples of read data using python and matlab respectively. For use dataset in matlab download Dataset/HeatTransferPhenomena_35_58.zip, unzip it and then use ReadDataset_Matlab.m as an example. In case of python, download Dataset/HeatTransferPhenomena_35_58.npz and run ReadDataset_Python.py.

  19. Z

    Data from: #PraCegoVer dataset

    • data.niaid.nih.gov
    Updated Jan 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esther Luna Colombini (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
    Explore at:
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Gabriel Oliveira dos Santos
    Esther Luna Colombini
    Sandra Avila
    Description

    Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    Dataset Structure

    PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

    containing the images. The file dataset.json comprehends a list of json objects with the attributes:

    user: anonymized user that made the post;

    filename: image file name;

    raw_caption: raw caption;

    caption: clean caption;

    date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

  20. e

    sdaas - a Python tool computing an amplitude anomaly score of seismic data...

    • b2find.eudat.eu
    Updated Jun 29, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2007). sdaas - a Python tool computing an amplitude anomaly score of seismic data and metadata using simple machine learning algorithm - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/b0ff5f26-69b6-597b-a879-299e3c5118f1
    Explore at:
    Dataset updated
    Jun 29, 2007
    Description

    The increasingly high number of big data applications in seismology has made quality control tools to filter, discard, or rank data of extreme importance. In this framework, machine learning algorithms, already established in several seismic applications, are good candidates to perform the task flexibility and efficiently. sdaas (seismic data/metadata amplitude anomaly score) is a Python library and command line tool for detecting a wide range of amplitude anomalies on any seismic waveform segment such as recording artifacts (e.g., anomalous noise, peaks, gaps, spikes), sensor problems (e.g., digitizer noise), metadata field errors (e.g., wrong stage gain in StationXML). The underlying machine learning model, based on the isolation forest algorithm, has been trained and tested on a broad variety of seismic waveforms of different length, from local to teleseismic earthquakes to noise recordings from both broadband and accelerometers. For this reason, the software assures a high degree of flexibility and ease of use: from any given input (waveform in miniSEED format and its metadata as StationXML, either given as file path or FDSN URLs), the computed anomaly score is a probability-like numeric value in [0, 1] indicating the degree of belief that the analyzed waveform represents an anomaly (or outlier), where scores ≤0.5 indicate no distinct anomaly. sdaas can be employed for filtering malformed data in a pre-process routine, assign robustness weights, or be used as metadata checker by computing randomly selected segments from a given station/channel: in this case, a persistent sequence of high scores clearly indicates problems in the metadata

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data Insights Market (2025). Big Data Intelligence Engine Report [Dataset]. https://www.datainsightsmarket.com/reports/big-data-intelligence-engine-1991939

Big Data Intelligence Engine Report

Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
May 21, 2025
Dataset authored and provided by
Data Insights Market
License

https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The Big Data Intelligence Engine market is experiencing robust growth, driven by the increasing need for advanced analytics across diverse sectors. The market's expansion is fueled by several key factors: the exponential growth of data volume from various sources (IoT devices, social media, etc.), the rising adoption of cloud computing for data storage and processing, and the increasing demand for real-time insights to support faster and more informed decision-making. Applications spanning data mining, machine learning, and artificial intelligence are significantly contributing to this market expansion. Furthermore, the rising adoption of programming languages like Java, Python, and Scala, which are well-suited for big data processing, is further fueling market growth. Technological advancements, such as the development of more efficient and scalable algorithms and the emergence of specialized hardware like GPUs, are also playing a crucial role. While data security and privacy concerns, along with the high initial investment costs associated with implementing Big Data Intelligence Engine solutions, pose some restraints, the overall market outlook remains extremely positive. The competitive landscape is dominated by a mix of established technology giants like IBM, Microsoft, Google, and Amazon, and emerging players such as Alibaba Cloud, Tencent Cloud, and Baidu Cloud. These companies are aggressively investing in research and development to enhance their offerings and expand their market share. The market is geographically diverse, with North America and Europe currently holding significant market shares. However, the Asia-Pacific region, particularly China and India, is expected to witness the fastest growth in the coming years due to increasing digitalization and government initiatives promoting technological advancements. This growth is further segmented by application (Data Mining, Machine Learning, AI) and programming languages (Java, Python, Scala), offering opportunities for specialized solutions and services. The forecast period of 2025-2033 promises substantial growth, driven by continued innovation and widespread adoption across industries.

Search
Clear search
Close search
Google apps
Main menu