Facebook
TwitterThis datasets contains the tokenized version of a dataset containing 60% of OSCAR spanish corpus, wiki data from multiple countries and medicine books. As the weight is so big i needed to cut the OSCAR corpus to make it a little bit smaller, for the same reason i uploaded the tokenized version as If you want/need to work with this dataset inside kaggle you do not have enough space for tokenizing the dataset.
I have also uploaded the code used for tokenize the dataset.
If you want me to upload the entire dataset divided in 4 parts ask for It. :)
Facebook
Twitterhttps://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, Information Technology Global Market Size was USD XX Million in 2024 and is set to achieve a market size of USD XX Million by the end of 2033 growing at a CAGR of XX% from 2025 to 2033.
North America held largest share of xx% in the year 2024
Europe held share of xx% in the year 2024
Asia-Pacific held significant share of xx% in the year 2024
South America held significant share of xx% in the year 2024
Middle East and Africa held significant share of xx% in the year 2024
Market Dynamics of IT Information Technology Market
Key Drivers of IT Information Technology Market
The Growing Adoption of Cloud Computing, Artificial Intelligence, and Big Data
The extensive incorporation of cutting-edge digital technologies—cloud computing, AI, and big data—serves as a key catalyst for the growth of the IT market. Cloud computing provides businesses with scalable and adaptable infrastructure, AI enhances operational efficiency through automation and predictive analytics, and big data supports informed decision-making. For example, Atera’s collaboration with Azure OpenAI facilitates predictive issue resolution and significantly enhances IT productivity. These technologies are transforming workflows across various industries and driving innovation, ensuring that the IT sector remains at the forefront of global digital transformation.
The Transformative Influence of IoT is Enhancing the Global IT Sector
The rapid proliferation of Internet of Things (IoT) devices—projected to exceed 16.6 billion by the close of 2023—has intensified the demand for IT infrastructure, services, and analytics. IoT fosters real-time data gathering, automation, and predictive maintenance in sectors such as healthcare, manufacturing, and smart cities. The immense data produced by interconnected devices is propelling advancements in AI, cloud computing, and edge computing. With increasing investments in 5G and digital infrastructure, IoT continues to serve as a vital enabler of IT market growth on a global scale.
(Source:https://iot-analytics.com/product/state-of-iot-summer-2024/)
Key Restraints in IT Information Technology Market
Growing Concerns Regarding Data Privacy are Impeding IT Market Expansion
High-profile cyber incidents, such as the 2021 Microsoft Exchange Server breach, have triggered considerable anxiety regarding data security. Consumer apprehensions about surveillance, unauthorized access, and the corporate misuse of personal data are on the rise. According to Deloitte, almost 60% of consumers express concerns about security breaches, with trust in corporate data management notably diminished. This situation has prompted demands for more stringent privacy regulations and may hinder digital adoption due to heightened compliance requirements and public skepticism.
https://en.wikipedia.org/wiki/WannaCry_ransomware_attack)
Cybersecurity Threats and the Escalation of Attack Complexity
The emergence of intricate cyber threats, such as ransomware (e.g., WannaCry), poses a persistent challenge for the IT industry. Cybercriminals take advantage of weaknesses in essential systems, leading to financial losses, data breaches, and damage to reputation. Tackling cybersecurity necessitates ongoing investment in threat detection, endpoint security, and adherence to regulations. These evolving threats not only increase operational expenses but also discourage smaller enterprises from adopting advanced IT solutions due to the fear of vulnerability.
Key Trends of IT Information Technology Market
Expansion of Edge Computing to Facilitate Real-Time Applications
As IoT and smart devices become more prevalent, edge computing is gaining traction by processing data nearer to its source. This approach minimizes latency and enhances response times, making it particularly suitable for real-time applications such as autonomous vehicles, smart manufacturing, and augmented reality. The shift towards edge infrastructure is transforming IT architectures to more effectively balance cloud and on-premise computing requirements.
Increase i...
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
A census of all the wikis hosted in Wikia (Now renamed to Fandom). A dataset consisting on data of more than 300 thousand wikis, such as: language, topic, number of users, admins, articles, edits, pages, number of users with a certain number of contributions, number of bots, etc.
A study of this data has been presented in the Opensym 2018 conference. You can find the Jupyter notebook code regarding that study under the "Kernels" section.
There are several files of data: - wikia_stats.csv: general data about each wiki. - wikia_stats_users.csv: general data about each wiki + number of human registered users, categorized according to the number of edits in the last 30 days (Users_N). - wikia_stats_users_birthdate.csv: all the data above plus the estimated date of birth.
If you are just looking for the whole dataset corresponding the Wikia census, go for the wikia_stats_users_birthdate.csv file
The other two .txt files contains pairs of (name, url) of the raw index crawled from the Wikia Sitemap, and the corresponding curated index with only the working wikis.
The date of the data collection of this second version is October 2018. First version was February 2018.
The collection of the data has been made using the scripts located here: https://github.com/Grasia/wiki-scripts
The license of the data is not clearly stated by Wikia, because this data is publicly available in their website but they haven't established anything in their license policy.
All the data is possible thanks to FANDOM, the company supporting Wikia, and thank to all the contributors to the wikis.
We want to find the patterns that characterizes a healthy and sustainable online community.
Wikia is a huge ecosystem of these communities where small, medium, big as well as young and old community coexist, so it is a perfect scenario to study online collaboration.
This data is released under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC-BY-SA). Please attribute FANDOM (The company behind Wikia) and me (Abel Serrano Juste) when using this data.
Facebook
TwitterRemark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Dataset is downloaded from https://amp.pharm.mssm.edu/archs4/download.html The methods are described in Nature Communications paper: https://www.nature.com/articles/s41467-018-03751-6
The ARCHS4 data provides user-friendly access to multiple gene expression data from the GEO database. (https://www.ncbi.nlm.nih.gov/geo/ ). While in GEO database most of data is stored in raw formats, ARCHS4 provides prepared count matrix expression data. While GEO contains data stored separately for each research paper, ARCHS4 collects all the information in one single matrix. One may consult the main site for further information.
Main data files are in H5 (HD5, Hierarchical Data Format ) file format https://en.wikipedia.org/wiki/Hierarchical_Data_Format It contains expression data, as well as annotation data and futher meta-information. There are several other auxilliary files like TSNE 3d projection (in CSV format) and correlation matrices for genes for human and mouse in feather format.
The main file (for human): human_matrix.h5 - contains data matrix - which is 238522 samples times 35238 genes, as well as, various meta information: gene names, samples information (tissue, etc), references to GEO database id where all the details can be found.
There is also similar data for mouse, csv files with TSNE images, correlation matrices for genes.
The ARCHS4 project is by :
'Alexander Lachmann', 'alexander.lachmann@mssm.edu', update: '2020-02-06'
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.
Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.
In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]. Fork this kernel to get started.
Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".
Photo by Andre Francois on Unsplash.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Alphabet Inc. is an American multinational technology conglomerate holding company headquartered in Mountain View, California. Alphabet is the world's second-largest technology company by revenue, after Apple, and one of the world's most valuable companies. It was created through a restructuring of Google on October 2, 2015, and became the parent holding company of Google and several former Google subsidiaries. It is considered one of the Big Five American information technology companies, alongside Amazon, Apple, Meta, and Microsoft.
The establishment of Alphabet Inc. was prompted by a desire to make the core Google business "cleaner and more accountable" while allowing greater autonomy to group companies that operate in businesses other than Internet services. Founders Larry Page and Sergey Brin announced their resignation from their executive posts in December 2019, with the CEO role to be filled by Sundar Pichai, who is also the CEO of Google. Page and Brin remain employees, board members, and controlling shareholders of Alphabet Inc.
Source: https://en.wikipedia.org/wiki/Alphabet_Inc.
This dataset provides historical data of GOOG. stock (Google). The data is available at a daily level. Currency is USD.
These terms are key indicators in stock market trading and analysis, providing information about a stock's price movements and trading activity over a specific period (e.g., a day, week, or month):
The final price at which a stock trades during a specific trading session (e.g., at the end of the day). This price is often used as a reference point for comparing daily price movements.
The first price at which a stock trades when the market opens for the day. It can be influenced by after-hours trading, news, or economic events.
The highest price at which a stock trades during a specific trading session. It shows the maximum value reached by the stock in that period.
The lowest price at which a stock trades during a specific trading session. It represents the minimum value reached by the stock in that period.
The total number of shares traded during a specific period. It indicates the level of interest or activity in a stock, with higher volumes often reflecting greater market interest or volatility.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis datasets contains the tokenized version of a dataset containing 60% of OSCAR spanish corpus, wiki data from multiple countries and medicine books. As the weight is so big i needed to cut the OSCAR corpus to make it a little bit smaller, for the same reason i uploaded the tokenized version as If you want/need to work with this dataset inside kaggle you do not have enough space for tokenizing the dataset.
I have also uploaded the code used for tokenize the dataset.
If you want me to upload the entire dataset divided in 4 parts ask for It. :)