6 datasets found
  1. MEDICINA-corpus_reducido+MIR+wiki

    • kaggle.com
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel González Martínez (2023). MEDICINA-corpus_reducido+MIR+wiki [Dataset]. https://www.kaggle.com/datasets/manuelgonzlezmartnez/medicina-corpus-reducido-mir-wiki
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Manuel González Martínez
    Description

    This datasets contains the tokenized version of a dataset containing 60% of OSCAR spanish corpus, wiki data from multiple countries and medicine books. As the weight is so big i needed to cut the OSCAR corpus to make it a little bit smaller, for the same reason i uploaded the tokenized version as If you want/need to work with this dataset inside kaggle you do not have enough space for tokenizing the dataset.

    I have also uploaded the code used for tokenize the dataset.

    If you want me to upload the entire dataset divided in 4 parts ask for It. :)

  2. c

    Global IT Information Technology Market Report 2025 Edition, Market Size,...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research, Global IT Information Technology Market Report 2025 Edition, Market Size, Share, CAGR, Forecast, Revenue [Dataset]. https://www.cognitivemarketresearch.com/it-information-technology-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, Information Technology Global Market Size was USD XX Million in 2024 and is set to achieve a market size of USD XX Million by the end of 2033 growing at a CAGR of XX% from 2025 to 2033.

    North America held largest share of xx% in the year 2024 
    Europe held share of xx% in the year 2024 
    Asia-Pacific held significant share of xx% in the year 2024 
    South America held significant share of xx% in the year 2024
    Middle East and Africa held significant share of xx% in the year 2024 
    

    Market Dynamics of IT Information Technology Market

    Key Drivers of IT Information Technology Market

    The Growing Adoption of Cloud Computing, Artificial Intelligence, and Big Data

    The extensive incorporation of cutting-edge digital technologies—cloud computing, AI, and big data—serves as a key catalyst for the growth of the IT market. Cloud computing provides businesses with scalable and adaptable infrastructure, AI enhances operational efficiency through automation and predictive analytics, and big data supports informed decision-making. For example, Atera’s collaboration with Azure OpenAI facilitates predictive issue resolution and significantly enhances IT productivity. These technologies are transforming workflows across various industries and driving innovation, ensuring that the IT sector remains at the forefront of global digital transformation.

    Source:https://www.microsoft.com/en/customers/story/1662731177894407321-atera-professional-services-azure-en-israel

    The Transformative Influence of IoT is Enhancing the Global IT Sector

    The rapid proliferation of Internet of Things (IoT) devices—projected to exceed 16.6 billion by the close of 2023—has intensified the demand for IT infrastructure, services, and analytics. IoT fosters real-time data gathering, automation, and predictive maintenance in sectors such as healthcare, manufacturing, and smart cities. The immense data produced by interconnected devices is propelling advancements in AI, cloud computing, and edge computing. With increasing investments in 5G and digital infrastructure, IoT continues to serve as a vital enabler of IT market growth on a global scale.

    (Source:https://iot-analytics.com/product/state-of-iot-summer-2024/)

    Key Restraints in IT Information Technology Market

    Growing Concerns Regarding Data Privacy are Impeding IT Market Expansion

    High-profile cyber incidents, such as the 2021 Microsoft Exchange Server breach, have triggered considerable anxiety regarding data security. Consumer apprehensions about surveillance, unauthorized access, and the corporate misuse of personal data are on the rise. According to Deloitte, almost 60% of consumers express concerns about security breaches, with trust in corporate data management notably diminished. This situation has prompted demands for more stringent privacy regulations and may hinder digital adoption due to heightened compliance requirements and public skepticism.

    (Source:https://www2.deloitte.com/us/en/insights/industry/telecommunications/connectivity-mobile-trends-survey/2023/data-privacy-and-security.html

    https://en.wikipedia.org/wiki/WannaCry_ransomware_attack)

    Cybersecurity Threats and the Escalation of Attack Complexity

    The emergence of intricate cyber threats, such as ransomware (e.g., WannaCry), poses a persistent challenge for the IT industry. Cybercriminals take advantage of weaknesses in essential systems, leading to financial losses, data breaches, and damage to reputation. Tackling cybersecurity necessitates ongoing investment in threat detection, endpoint security, and adherence to regulations. These evolving threats not only increase operational expenses but also discourage smaller enterprises from adopting advanced IT solutions due to the fear of vulnerability.

    Key Trends of IT Information Technology Market

    Expansion of Edge Computing to Facilitate Real-Time Applications

    As IoT and smart devices become more prevalent, edge computing is gaining traction by processing data nearer to its source. This approach minimizes latency and enhances response times, making it particularly suitable for real-time applications such as autonomous vehicles, smart manufacturing, and augmented reality. The shift towards edge infrastructure is transforming IT architectures to more effectively balance cloud and on-premise computing requirements.

    Increase i...

  3. Wikia census / Fandom census

    • kaggle.com
    zip
    Updated Oct 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abel Serrano Juste (2018). Wikia census / Fandom census [Dataset]. https://www.kaggle.com/abeserra/wikia-census
    Explore at:
    zip(87833068 bytes)Available download formats
    Dataset updated
    Oct 19, 2018
    Authors
    Abel Serrano Juste
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    A census of all the wikis hosted in Wikia (Now renamed to Fandom). A dataset consisting on data of more than 300 thousand wikis, such as: language, topic, number of users, admins, articles, edits, pages, number of users with a certain number of contributions, number of bots, etc.

    A study of this data has been presented in the Opensym 2018 conference. You can find the Jupyter notebook code regarding that study under the "Kernels" section.

    Content

    There are several files of data: - wikia_stats.csv: general data about each wiki. - wikia_stats_users.csv: general data about each wiki + number of human registered users, categorized according to the number of edits in the last 30 days (Users_N). - wikia_stats_users_birthdate.csv: all the data above plus the estimated date of birth.

    If you are just looking for the whole dataset corresponding the Wikia census, go for the wikia_stats_users_birthdate.csv file

    The other two .txt files contains pairs of (name, url) of the raw index crawled from the Wikia Sitemap, and the corresponding curated index with only the working wikis.

    The date of the data collection of this second version is October 2018. First version was February 2018.

    The collection of the data has been made using the scripts located here: https://github.com/Grasia/wiki-scripts

    The license of the data is not clearly stated by Wikia, because this data is publicly available in their website but they haven't established anything in their license policy.

    Acknowledgements

    All the data is possible thanks to FANDOM, the company supporting Wikia, and thank to all the contributors to the wikis.

    Inspiration

    We want to find the patterns that characterizes a healthy and sustainable online community.

    Wikia is a huge ecosystem of these communities where small, medium, big as well as young and old community coexist, so it is a perfect scenario to study online collaboration.

    License

    This data is released under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC-BY-SA). Please attribute FANDOM (The company behind Wikia) and me (Abel Serrano Juste) when using this data.

  4. Multiple Single Cell RNA Expressions ARCHS4

    • kaggle.com
    zip
    Updated Jul 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2021). Multiple Single Cell RNA Expressions ARCHS4 [Dataset]. https://www.kaggle.com/datasets/alexandervc/multiple-single-cell-rna-expressions-archs4/data
    Explore at:
    zip(23319014182 bytes)Available download formats
    Dataset updated
    Jul 25, 2021
    Authors
    Alexander Chervov
    Description

    Remark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev

    Context

    Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6

    The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.

    Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.

    Content

    The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.

    There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.

    Acknowledgements

    The ARCHS4 project is by :

    'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'

  5. Bitcoin Blockchain Historical Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Bitcoin Blockchain Historical Data [Dataset]. https://www.kaggle.com/bigquery/bitcoin-blockchain
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.

    Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.

    Content

    In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]. Fork this kernel to get started.

    Method & Acknowledgements

    Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".

    Photo by Andre Francois on Unsplash.

    Inspiration

    • How many bitcoins are sent each day?
    • How many addresses receive bitcoin each day?
    • Compare transaction volume to historical prices by joining with other available data sources
  6. Google 2020-2025 Stock Market

    • kaggle.com
    zip
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Negin Moghadasi (2025). Google 2020-2025 Stock Market [Dataset]. https://www.kaggle.com/datasets/negmgh/google-2020-2025-stock-market
    Explore at:
    zip(23003 bytes)Available download formats
    Dataset updated
    Jan 13, 2025
    Authors
    Negin Moghadasi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Google 2020-2025 Stock Price

    Alphabet Inc. is an American multinational technology conglomerate holding company headquartered in Mountain View, California. Alphabet is the world's second-largest technology company by revenue, after Apple, and one of the world's most valuable companies. It was created through a restructuring of Google on October 2, 2015, and became the parent holding company of Google and several former Google subsidiaries. It is considered one of the Big Five American information technology companies, alongside Amazon, Apple, Meta, and Microsoft.

    The establishment of Alphabet Inc. was prompted by a desire to make the core Google business "cleaner and more accountable" while allowing greater autonomy to group companies that operate in businesses other than Internet services. Founders Larry Page and Sergey Brin announced their resignation from their executive posts in December 2019, with the CEO role to be filled by Sundar Pichai, who is also the CEO of Google. Page and Brin remain employees, board members, and controlling shareholders of Alphabet Inc.

    Source: https://en.wikipedia.org/wiki/Alphabet_Inc.

    Information about this dataset

    This dataset provides historical data of GOOG. stock (Google). The data is available at a daily level. Currency is USD.

    These terms are key indicators in stock market trading and analysis, providing information about a stock's price movements and trading activity over a specific period (e.g., a day, week, or month):

    Close Price:

    The final price at which a stock trades during a specific trading session (e.g., at the end of the day). This price is often used as a reference point for comparing daily price movements.

    Open Price:

    The first price at which a stock trades when the market opens for the day. It can be influenced by after-hours trading, news, or economic events.

    High Price:

    The highest price at which a stock trades during a specific trading session. It shows the maximum value reached by the stock in that period.

    Low Price:

    The lowest price at which a stock trades during a specific trading session. It represents the minimum value reached by the stock in that period.

    Volume:

    The total number of shares traded during a specific period. It indicates the level of interest or activity in a stock, with higher volumes often reflecting greater market interest or volatility.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Manuel González Martínez (2023). MEDICINA-corpus_reducido+MIR+wiki [Dataset]. https://www.kaggle.com/datasets/manuelgonzlezmartnez/medicina-corpus-reducido-mir-wiki
Organization logo

MEDICINA-corpus_reducido+MIR+wiki

Spanish corpus + wiki information + med books

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Manuel González Martínez
Description

This datasets contains the tokenized version of a dataset containing 60% of OSCAR spanish corpus, wiki data from multiple countries and medicine books. As the weight is so big i needed to cut the OSCAR corpus to make it a little bit smaller, for the same reason i uploaded the tokenized version as If you want/need to work with this dataset inside kaggle you do not have enough space for tokenizing the dataset.

I have also uploaded the code used for tokenize the dataset.

If you want me to upload the entire dataset divided in 4 parts ask for It. :)

Search
Clear search
Close search
Google apps
Main menu