Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created by Pranshu Jayswal
Released under CC BY-NC-SA 4.0
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by kwang
Released under CC0: Public Domain
This dataset was created by JITENDRA PYLA.
Released under Data files © Original Authors
This dataset was created by Po Yan Chuan
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
his dataset contains cleaned and time-synced OHLC (Open, High, Low, Close) data for the NIFTY 50 index, covering the period from 9th January 2015 to 25th April 2025.
It includes:
5-minute timeframe data (intraday)
25-minute aggregated interval (useful for trend and momentum strategies)
Daily candles for long-term technical setups
This dataset is ideal for:
Quantitative trading research
Algorithmic strategy backtesting (MACD, RSI, Price Action, etc.)
Time-series analysis & forecasting
No forward-filled or synthetic values were used — all data is from real market trading sessions.
This dataset was created by Abhinesh reddy
This dataset was created by Foridur
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was used to develop quantitative regression QSAR models to predict acute aquatic toxicity towards the fish Pimephales promelas (fathead minnow) on a set of 908 chemicals. LC50 data, which is the concentration that causes death in 50% of test fish over a test duration of 96 hours, was used as model response. The model comprised 6 molecular descriptors: MLOGP (molecular properties), CIC0 (information indices), GATS1i (2D autocorrelations), NdssC (atom-type counts), NdsCH ((atom-type counts), SM1_Dz(Z) (2D matrix-based descriptors). Details can be found in the quoted reference: M. Cassotti, D. Ballabio, R. Todeschini, V. Consonni. A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas), SAR and QSAR in Environmental Research (2015), 26, 217-243; doi: 10.1080/1062936X.2015.1018938
6 molecular descriptors and 1 quantitative experimental response: 1) CIC0 2) SM1_Dz(Z) 3) GATS1i 4) NdsCH 5) NdssC 6) MLOGP 7) quantitative response, LC50 [-LOG(mol/L)]
M. Cassotti, D. Ballabio, R. Todeschini, V. Consonni. A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas), SAR and QSAR in Environmental Research (2015), 26, 217-243; doi: 10.1080/1062936X.2015.1018938
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data to understand how to transform raw Google Analytic data into qualitative usability metrics.
Google Analytics data (Title and Duration in seconds) Hotjar (NPS) Survey Monkey (NPS)
Bebideria.com.br´s developers and its readers Google´s developers and communities that keep it running and free Hotjar´s team and CEO who supported me exporting raw data and even improving the app towards my experiment´s needs. Finally UDESC´s PPGDesign professors who guided and oriented this study
Let´s make better products!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset originates from a deep learning model trained on the "Coimbra Breast Cancer" dataset, with feature distributions closely resembling the original. The original data includes clinical observations from 64 patients with breast cancer and 52 healthy controls, encompassing 10 quantitative predictors and a binary dependent variable indicating the presence or absence of breast cancer.
Quantitative Attributes:
Age (years): Represents the age of individuals in the dataset.
BMI (kg/m²): Body Mass Index, a measure of body fat based on weight and height.
Glucose (mg/dL): Reflects blood glucose levels, a vital metabolic indicator.
Insulin (µU/mL): Indicates insulin levels, a hormone associated with glucose regulation.
HOMA: Homeostatic Model Assessment, a method assessing insulin resistance and beta-cell function.
Leptin (ng/mL): Represents leptin levels, a hormone involved in appetite and energy balance regulation.
Adiponectin (µg/mL): Reflects adiponectin levels, a protein associated with metabolic regulation.
Resistin (ng/mL): Indicates resistin levels, a protein implicated in insulin resistance.
MCP-1 (pg/dL): Reflects Monocyte Chemoattractant Protein-1 levels, a cytokine involved in inflammation.
Labels:
1: Healthy controls
2: Patients with breast cancer
These quantitative attributes, including anthropometric data and parameters gathered from routine blood analysis, serve as the foundation for potential biomarkers of breast cancer. The dataset presents an opportunity for developing accurate prediction models, aiding in the identification and understanding of factors associated with breast cancer.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Macroeconomic data is an important source for both institutions and companies to have a rough sense of what government's policies and economy will head to. This dataset can help macroeconomic and fundamental analysts to do research on Chinese market or macroeconomics. Quantitative researchers can also use this dataset as a reference to assist them making better strategies. The SHIBOR rate of different maturities is recorded at daily frequency. Users can construct the yield curve for economic research. Quantitative researchers can use it to see how SHIBOR influences the overall Chinese stock & fixed income market and etc. Many Chinese Indices are also very important in conducting research about Chinese market & economy. These data are also at daily frequency. Other macroeconomic data are recorded in monthly frequency and thus can be used to conduct broader area of economic and financial research and etc.
Topic Modeling for Research Articles Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.
Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.
Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:
Computer Science
Physics
Mathematics
Statistics
Quantitative Biology
Quantitative Finance
Column | Description |
---|---|
ID | Unique ID for each article |
TITLE | Title of the research article |
ABSTRACT | Abstract of the research article |
Computer Science | Whether article belongs to topic computer science (1/0) |
Physics | Whether article belongs to topic physics (1/0) |
Mathematics | Whether article belongs to topic Mathematics (1/0) |
Statistics | Whether article belongs to topic Statistics (1/0) |
Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
ID | Unique ID for each article |
---|---|
TITLE | Title of the research article |
ABSTRACT | Abstract of the research article |
ID | Unique ID for each article |
---|---|
TITLE | Title of the research article |
ABSTRACT | Abstract of the research article |
Computer Science | Whether article belongs to topic computer science (1/0) |
Physics | Whether article belongs to topic physics (1/0) |
Mathematics | Whether article belongs to topic Mathematics (1/0) |
Statistics | Whether article belongs to topic Statistics (1/0) |
Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.
Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.
Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:
Column | Description |
---|---|
ID | Unique ID for each article |
TITLE | Title of the research article |
ABSTRACT | Abstract of the research article |
Computer Science | Whether article belongs to topic computer science (1/0) |
Physics | Whether article belongs to topic physics (1/0) |
Mathematics | Whether article belongs to topic Mathematics (1/0) |
Statistics | Whether article belongs to topic Statistics (1/0) |
Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Column | Description |
---|---|
ID | Unique ID for each article |
TITLE | Title of the research article |
ABSTRACT | Abstract of the research article |
Submissions are evaluated on micro F1 Score between the predicted and observed topics for each article in the test set
### Public and Private Split Test reviews are further divided into Public (40%) and Private (60%)
Your initial responses will be checked and scored on the Public data. The final rankings would be based on your private score which will be published once the competition is over.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The data and kernels here are connected to the lecture material on image segmentation from the Quantitative Big Imaging Course at ETH Zurich (kmader.github.io/Quantitative-Big-Imaging-2017). The specific description of the exercise tasks can be found (https://github.com/kmader/Quantitative-Big-Imaging-2016/blob/master/Exercises/03-Description.md)
The data was updated on March 13, 2022.
We have uploaded the complete data analysis as a txt.file. In addition, we uploaded information on age, sex, hospital room, and mapping rate as supplemental data.
Changes from last time! Use salmon's --validateMapping option for quantification. In addition, the --quantmerge option was used in place of tximport.
126 RNA-Seq datasets of COVID19 ( 100 COVID19, 26 Control cases ) Let’s find important transcripts in COVID-19 by using machine learning👍
RNA-Seq is a particular technology-based sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample, analyzing the continuously changing cellular transcriptome.
https://en.wikipedia.org/wiki/RNA-Seq
We download SRA and FASTQ files and prepared 126 RNA-Seq datasets. The known transcriptome (GRCh38) was used as a reference for quantitative analysis. The abundances of individual transcripts were quantified by Salmon. Finally, We applied tximport to create an output file for analysis.
Content Source : https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157103
Let’s find important transcripts in COVID-19 by using machine learning👍
This dataset deals with the factors that influence the compressive strength of concrete. There are a few input parameters and an output parameter.
Name -------------------------------Data Type --- Measurement--------Description
Cement (component 1) quantitative kg in a m3 mixture Input Variable Blast Furnace Slag (component 2) quantitative kg in a m3 mixture Input Variable Fly Ash (component 3) quantitative kg in a m3 mixture Input Variable Water (component 4) quantitative kg in a m3 mixture Input Variable Superplasticizer (component 5) quantitative kg in a m3 mixture Input Variable Coarse Aggregate (component 6) quantitative kg in a m3 mixture Input Variable Fine Aggregate (component 7) quantitative kg in a m3 mixture Input Variable Age quantitative Day (1~365) Input Variable Concrete compressive strength quantitative MPa Output Variable
Data has been sourced from : http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
Concrete is the most important material in civil engineering.
The concrete compressive strength is a highly nonlinear function of age and ingredients.
Number of instances 1030 Number of Attributes 9 Attribute breakdown 8 quantitative input variables, and 1 quantitative output variable Missing Attribute Values None
Given are the variable name, variable type, the measurement unit and a brief description. The concrete compressive strength is the regression problem. The order of this listing corresponds to the order of numerals along the rows of the database.
Name -- Data Type -- Measurement -- Description
Cement (component 1) -- quantitative -- kg in a m3 mixture -- Input Variable
Blast Furnace Slag (component 2) -- quantitative -- kg in a m3 mixture -- Input Variable
Fly Ash (component 3) -- quantitative -- kg in a m3 mixture -- Input Variable
Water (component 4) -- quantitative -- kg in a m3 mixture -- Input Variable
Superplasticizer (component 5) -- quantitative -- kg in a m3 mixture -- Input Variable
Coarse Aggregate (component 6) -- quantitative -- kg in a m3 mixture -- Input Variable
Fine Aggregate (component 7) -- quantitative -- kg in a m3 mixture -- Input Variable
Age -- quantitative -- Day (1~365) -- Input Variable
Concrete compressive strength -- quantitative -- MPa -- Output Variable
Original Owner and Donor
Prof. I-Cheng Yeh
Department of Information Management
Chung-Hua University,
Hsin Chu, Taiwan 30067, R.O.C.
e-mail:icyeh '@' chu.edu.tw
TEL:886-3-5186511
Date Donated: August 3, 2007
From: https://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
Main
1) I-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).
Others
2) I-Cheng Yeh, "Modeling Concrete Strength with Augment-Neuron Networks," J. of Materials in Civil Engineering, ASCE, Vol. 10, No. 4, pp. 263-268 (1998).
3) I-Cheng Yeh, "Design of High Performance Concrete Mixture Using Neural Networks," J. of Computing in Civil Engineering, ASCE, Vol. 13, No. 1, pp. 36-42 (1999).
4) I-Cheng Yeh, "Prediction of Strength of Fly Ash and Slag Concrete By The Use of Artificial Neural Networks," Journal of the Chinese Institute of Civil and Hydraulic Engineering, Vol. 15, No. 4, pp. 659-663 (2003).
5) I-Cheng Yeh, "A mix Proportioning Methodology for Fly Ash and Slag Concrete Using Artificial Neural Networks," Chung Hua Journal of Science and Engineering, Vol. 1, No. 1, pp. 77-84 (2003).
6) Yeh, I-Cheng, "Analysis of strength of concrete using design of experiments and neural networks," Journal of Materials in Civil Engineering, ASCE, Vol.18, No.4, pp.597-604 (2006).
NOTE: Reuse of this database is unlimited with retention of copyright notice for Prof. I-Cheng Yeh and the following published paper:
I-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998).
Can you predict the strength of concrete?
The world of Asset Management today, from a technological point of view, is mainly linked to mature but inefficient supply chains, which merge discretionary and quantitative forecasting models. The financial industry has been working in the shadows for years to overcome this paradigm, pushing beyond technology, making use not only of automated models (trading systems and dynamic asset allocation systems) but also of the most modern Machine Learning techniques for Time Series Forecasting and Unsupervised Learning for the classification of financial instruments. However, in most cases, it uses proprietary technologies that are limited by definition (workforce, technology investment, scalability). Numerai, an offshoot of Jim Simons’ Renaissance Technologies, was the first to blaze a new path by building a first centralized machine learning competition, in order to gather a swarm of predictors outside the company, to integrate with internal intelligence. The discretionary contribution was therefore eliminated, and the information content generated internally was enriched by thousands of external contributors, in many cases linked to sectors unrelated to the financial industry, such as energy, aerospace, or biotechnology. In fact, the concept that to obtain good market forecasts, it is necessary to have only skills related to the financial world is overcome. What we have just described is the starting point of Rocket Capital Investment. To overcome the limit imposed by Numerai, a new competition has been engineered, which has the ambition to make this project even more “democratic”. How? Decentralizing, thanks to the Blockchain, the entire chain of participant management, collection, and validation of forecasts, as well as decisions relating to the evaluation and remuneration of the participants themselves. In this way, it is possible to make every aspect of the competition completely transparent and inviolable. Everything is managed by a Smart Contract, whose rules are known and shared. Let’s find out in more detail what it is.
Starting from the idea of Numerai, we have completely re-engineered all aspects related to the management of participants, Scoring, and Reward, following the concept of decentralization of the production chain. To this end, a proprietary token (MUSA token) has been created which acts as an exchange currency and which integrates a smart contract that acts as an autonomous competition manager. The communication interface between the users and the smart contract is a DApp (“Decentralized Application”). But let’s see in more detail how all these elements combine with each other, like in a puzzle.
A suitably normalized dataset is issued every week, containing data from over 400 cryptocurrencies. For each asset, the data relating to prices, volumes traded, quantitative elements, as well as alternative data (information on the blockchain and on the sentiment of the various providers) are aggregated. Another difference with Numerai is the ability to distinguish assets for each row (the first column shows the related ticker). The last column instead contains the question to which the Data Scientists are asked to give an answer: the relative strength ranking of each asset, built on the forecast of the percentage change expected in the following week.
Registration for the Competition takes place by providing, in a completely anonymous way, the address of a crypto wallet on which the MUSA tokens are loaded. From that moment on, the MUSAs become, to all intents and purposes, the currency of exchange between participants and organizers. Every Monday a new Challenge opens, and all Data Scientists registered in the Contest are asked to use their models to generate predictions. By accessing the DApp, the participant can download the new dataset, complete with the history of the previous weeks and the last useful week. At this point the participant can perform two actions in sequence directly from the DApp: - Staking: MUSA tokens are placed on your prediction. - Submission: the forecast for the following week is uploaded to the blockchain.
Since the forecast consists of a series of numbers between 0 and 1 associated with each asset, it is very easy, the following week, to calculate the error committed in terms of RMSE (“Root Mean Square Error”). This allows creating a ranking on the participants, to be able to reward them accordingly with additional MUSA tokens. But let’s see in more detail how the Smart Contract, which was created, allows us to differentiate the reward based on different items (all, again, in a completely transparent and verifiable way): - Staking Reward: the mere fact of participating in the competition is remunerated. In future versions, it will also be possible to bet on the goodness of the other participants’ predictions. - Challenge Rew...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Data set including all the Variables (choice, college,hsg2, coml5, typez, fuelz, pricez, speedz, pollutionz, sizez) from 2016 to 2018.I scrapped this data from www.qed.econ.queensu.ca
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created by Pranshu Jayswal
Released under CC BY-NC-SA 4.0