The ranges of values of the hyperparameters of the benchmark datasets.
UKCP09 Time series of extreme temperatures. Annual maximum temperature minus annual minimum temperature. The datasets have been created with financial support from the Department for Environment, Food and Rural Affairs (Defra) and they are being promoted by the UK Climate Impacts Programme (UKCIP) as part of the UK Climate Projections (UKCP09). http://ukclimateprojections.defra.gov.uk/content/view/12/689/. To view this data you will have to register on the Met Office website, here: http://www.metoffice.gov.uk/research/climate/climate-monitoring/UKCP09/register
Motivation: Home range is a common measure of animal space use as it provides ecological information that is useful for conservation applications. In macroecological studies, values are typically aggregated to species means to examine general patterns of animal space use. However, this ignores the environmental context in which the home range was estimated and does not account for intraspecific variation in home range size. In addition, the focus of macroecological studies on home ranges has been historically biased toward terrestrial mammals. The use of aggregated numbers and terrestrial focus limits our ability to examine home range patterns across different environments, variation in time and between different levels of organisation. Here we introduce HomeRange, a global database with 75,611 home-range values across 960 different mammal species, including terrestrial, as well as aquatic and aerial species. Main types of variable contained: The dataset contains mammal home-range estim..., Mammalian home range papers were compiled via an extensive literature search. All home range values were extracted from the literature including individual, group and population-level home range values. Associated values were also compiled including species names, methodological information on data collection, home-range estimation method, period of data collection, study coordinates and name of location, as well as species traits derived from the studies, such as body mass, life stage, reproductive status and locomotor habit. Here we include the database, associated metadata and reference list of all sources from which home range data was extracted from. We also provide an R package, which can be installed from https://github.com/SHoeks/HomeRange. The HomeRange R package provides functions for downloading the latest version of the HomeRange database and loading it as a standard dataframe into R, plotting several statistics of the database and finally attaching species traits (e.g. spe..., , # Title of Dataset: HomeRange: A global database of mammalian home ranges
Mammalian home range papers were compiled via an extensive literature search. All home range values were extracted from the literature including individual, group and population-level home range values. Associated values were also compiled including species names, methodological information on data collection, home-range estimation method, period of data collection, study coordinates and name of location, as well as species traits derived from the studies, such as body mass, life stage, reproductive status and locomotor habit.
We also provide an R package, which can be installed from https://github.com/SHoeks/HomeRange. The HomeRange R package provides functions for downloading the latest version of the HomeRange database and loading it as a standard dataframe into R, plotting several statistics of the database and finally attaching species traits (e.g. species average body mass, trophic level). from the CO...
UKCP09: Gridded datasets of annual values. Extreme temperature range. The day-by-day sum of the mean number of degrees by which the air temperature is more than a value of 22 °C Annual maximum temperature minus annual minimum temperature.
The datasets have been created with financial support from the Department for Environment, Food and Rural Affairs (Defra) and they are being promoted by the UK Climate Impacts Programme (UKCIP) as part of the UK Climate Projections (UKCP09). http://ukclimateprojections.defra.gov.uk/content/view/12/689/.
To view this data you will have to register on the Met Office website, here: http://www.metoffice.gov.uk/research/climate/climate-monitoring/UKCP09/register
The Savings Bond Value Files dataset is used by developers of bond pricing programs to update their systems with new redemption values for accrual savings bonds (Series E, EE, I & Savings Notes). The core data is the same as the Redemption Tables but there are differences in format, amount of data, and date range. The Savings Bonds Value Files dataset is meant for programmers and developers to read in redemption values without having to first convert PDFs.
The lidar Topographic Wetness Index (TWI) is the TWI data product produced and distributed by the National Park Service, Great Smoky Mountains National Park. Concave, low gradient areas will gather water (low TWI values), whereas steep, convex areas will shed water (high TWI values). Values range range from less than 1 (dry cells) to greater than 20 (wet cells).
The Precipitation-Runoff Modeling System (PRMS) was used to produce simulations of streamflow for seven watersheds in eastern and central Montana for a baseline period (water years 1982-1999) and three future periods (water years 2021-2038, 2046-2063, and 2071-2038). The seven areas that were modeled are the O'Fallon, Redwater, Little Dry, Middle Musselshell, Judith, Cottonwood Creek, and Belt watersheds.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In this study, we introduce the count-based Morgan fingerprint (C-MF) to represent chemical structures of contaminants and develop machine learning (ML)-based predictive models for their activities and properties. Compared with the binary Morgan fingerprint (B-MF), C-MF not only qualifies the presence or absence of an atom group but also quantifies its counts in a molecule. We employ six different ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost) to develop models on 10 contaminant-related data sets based on C-MF and B-MF to compare them in terms of the model’s predictive performance, interpretation, and applicability domain (AD). Our results show that C-MF outperforms B-MF in nine of 10 data sets in terms of model predictive performance. The advantage of C-MF over B-MF is dependent on the ML algorithm, and the performance enhancements are proportional to the difference in the chemical diversity of data sets calculated by B-MF and C-MF. Model interpretation results show that the C-MF-based model can elucidate the effect of atom group counts on the target and have a wider range of SHAP values. AD analysis shows that C-MF-based models have an AD similar to that of B-MF-based ones. Finally, we developed a “ContaminaNET” platform to deploy these C-MF-based models for free use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
land and oceanic climate variables. The data cover the Earth on a 31km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km. ERA5 includes information about uncertainties for all variables at reduced spatial and temporal resolutions.
All parameters are assumed non-negative. S(0), , , I2(0), and R(0) define the initial population sizes. Dashes are used when values are arbitrarily chosen from some range.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:
The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)
*The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Predicting the fraction unbound in plasma provides a good understanding of the pharmacokinetic properties of a drug to assist candidate selection in the early stages of drug discovery. It is also an effective tool to mitigate the risk of late-stage attrition and to optimize further screening. In this study, we built in silico prediction models of fraction unbound in human plasma with freely available software, aiming specifically to improve the accuracy in the low value ranges. We employed several machine learning techniques and built prediction models trained on the largest ever data set of 2738 experimental values. The classification model showed a high true positive rate of 0.826 for the low fraction unbound class on the test set. The strongly biased distribution of the fraction unbound in plasma was mitigated by a logarithmic transformation in the regression model, leading to improved accuracy at lower values. Overall, our models showed better performance than those of previously published methods, including commercial software. Our prediction tool can be used on its own or integrated into other pharmacokinetic modeling systems.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Values in parentheses are the standard values used for numerical analysis of the model.
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
The European Values Study is a large-scale, cross-national and longitudinal survey research program on how Europeans think about family, work, religion, politics, and society. Repeated every nine years in an increasing number of countries, the survey provides insights into the ideas, beliefs, preferences, attitudes, values, and opinions of citizens all over Europe.
As previous waves conducted in 1981, 1990, 1999, 2008, the fifth EVS wave maintains a persistent focus on a broad range of values. Questions are highly comparable across waves and regions, making EVS suitable for research aimed at studying trends over time.
The new wave has seen a strengthening of the methodological standards. The full release of the EVS 2017 includes data and documentation of altogether 37 participating countries. For more information, please go to the EVS website.
Morale, religious, societal, political, work, and family values of Europeans.
Topics: 1. Perceptions of life: importance of work, family, friends and acquaintances, leisure time, politics and religion; happiness; self-assessment of own health; memberships in voluntary organisations (religious or church organisations, cultural activities, trade unions, political parties or groups, environment, ecology, animal rights, professional associations, sports, recreation, or other groups, none); active or inactive membership of humanitarian or charitable organisation, consumer organisation, self-help group or mutual aid; voluntary work in the last six months; tolerance towards minorities (people of a different race, heavy drinkers, immigrants, foreign workers, drug addicts, homosexuals, Christians, Muslims, Jews, and gypsies - social distance); trust in people; estimation of people´s fair and helpful behavior; internal or external control; satisfaction with life; importance of educational goals: desirable qualities of children.
Work: attitude towards work (job needed to develop talents, receiving money without working is humiliating, people turn lazy not working, work is a duty towards society, work always comes first); importance of selected aspects of occupational work; give priority to nationals over foreigners as well as men over women in jobs.
Religion and morale: religious denomination; current and former religious denomination; current frequency of church attendance and at the age of 12; self-assessment of religiousness; belief in God, life after death, hell, heaven, and re-incarnation; personal god vs. spirit or life force; importance of God in one´s life (10-point-scale); frequency of prayers; morale attitudes (scale: claiming state benefits without entitlement, cheating on taxes, taking soft drugs, accepting a bribe, homosexuality, abortion, divorce, euthanasia, suicide, paying cash to avoid taxes, casual sex, avoiding fare on public transport, prostitution, in-vitro fertilization, political violence, death penalty).
Family: trust in family; most important criteria for a successful marriage or partnership (faithfulness, adequate income, good housing, sharing household chores, children, time for friends and personal hobbies); marriage is an outdated institution; attitude towards traditional understanding of one´s role of man and woman in occupation and family (gender roles); homosexual couples are as good parents as other couples; duty towards society to have children; responsibility of adult children for their parents when they are in need of long-term care; to make own parents proud is a main goal in life.
Politics and society: political interest; political participation; preference for individual freedom or social equality; self-assessment on a left-right continuum (10-point-scale) (left-right self-placement); individual vs. state responsibility for providing; take any job vs. right to refuse job when unemployed; competition good vs. harmful for people; equal incomes vs. incentives for individual effort; private vs. government ownership of business and industry; postmaterialism (scale); most important aims of the country for the next ten years; willingness to fight for the country; expectation of future development (less importance placed on work and greater respect for authority); trust in institutions; essential characteristics of democracy; importance of democracy for the respondent; rating democracy in own country; satisfaction with the political system in the country; preferred type of political system (strong leader, expert decisions, army should ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description
This dataset contains a simulated collection of 1,00000 patient records designed to explore hypertension management in resource-constrained settings. It provides comprehensive data for analyzing blood pressure control rates, associated risk factors, and complications. The dataset is ideal for predictive modelling, risk analysis, and treatment optimization, offering insights into demographic, clinical, and treatment-related variables.
Dataset Structure
Dataset Volume
• Size: 10,000 records. • Features: 19 variables, categorized into Sociodemographic, Clinical, Complications, and Treatment/Control groups.
Variables and Categories
A. Sociodemographic Variables
1. Age:
• Continuous variable in years.
• Range: 18–80 years.
• Mean ± SD: 49.37 ± 12.81.
2. Sex:
• Categorical variable.
• Values: Male, Female.
3. Education:
• Categorical variable.
• Values: No Education, Primary, Secondary, Higher Secondary, Graduate, Post-Graduate, Madrasa.
4. Occupation:
• Categorical variable.
• Values: Service, Business, Agriculture, Retired, Unemployed, Housewife.
5. Monthly Income:
• Categorical variable in Bangladeshi Taka.
• Values: <5000, 5001–10000, 10001–15000, >15000.
6. Residence:
• Categorical variable.
• Values: Urban, Sub-urban, Rural.
B. Clinical Variables
7. Systolic BP:
• Continuous variable in mmHg.
• Range: 100–200 mmHg.
• Mean ± SD: 140 ± 15 mmHg.
8. Diastolic BP:
• Continuous variable in mmHg.
• Range: 60–120 mmHg.
• Mean ± SD: 90 ± 10 mmHg.
9. Elevated Creatinine:
• Binary variable (\geq 1.4 \, \text{mg/dL}).
• Values: Yes, No.
10. Diabetes Mellitus:
• Binary variable.
• Values: Yes, No.
11. Family History of CVD:
• Binary variable.
• Values: Yes, No.
12. Elevated Cholesterol:
• Binary variable (\geq 200 \, \text{mg/dL}).
• Values: Yes, No.
13. Smoking:
• Binary variable.
• Values: Yes, No.
C. Complications
14. LVH (Left Ventricular Hypertrophy):
• Binary variable (ECG diagnosis).
• Values: Yes, No.
15. IHD (Ischemic Heart Disease):
• Binary variable.
• Values: Yes, No.
16. CVD (Cerebrovascular Disease):
• Binary variable.
• Values: Yes, No.
17. Retinopathy:
• Binary variable.
• Values: Yes, No.
D. Treatment and Control
18. Treatment:
• Categorical variable indicating therapy type.
• Values: Single Drug, Combination Drugs.
19. Control Status:
• Binary variable.
• Values: Controlled, Uncontrolled.
Dataset Applications
1. Predictive Modeling:
• Develop models to predict blood pressure control status using demographic and clinical data.
2. Risk Analysis:
• Identify significant factors influencing hypertension control and complications.
3. Severity Scoring:
• Quantify hypertension severity for patient risk stratification.
4. Complications Prediction:
• Forecast complications like IHD, LVH, and CVD for early intervention.
5. Treatment Guidance:
• Analyze therapy efficacy to recommend optimal treatment strategies.
(1)Vector, human and non-human hosts natural death rates were estimated as 1/individual longevity. The range of variation of longevity (i.e. 1/death rate parameter defined in the model), as those are the raw data found in the literature (see sections ‘Vector local growth rate’ and ‘Human and non-human hosts natural death rates’ in Text S1).(2)Death rates were calculated as the sum of the natural death rate of human or non-human hosts and additional mortality imposed by the pathogen to infectious and ‘recovered’ individuals (as calculated in section ‘Human and non-human hosts mortality induced by the pathogen’ in Text S1).
This dataset contains whole major element geochemical data used to calculate values of the chemical alteration index (CIA), data for Nd, Sm, Y, and total REE and expected ranges for total REEY for samples of regolith overlying the Stewartsville pluton, Virginia. The southeastern United States was first identified as prospective for regolith-hosted REE deposits based on the recognition that the region has been subjected to a long history of intense differential chemical weathering and saprolitization, comparable to that which formed the REE clay deposits of South China and Southeast Asia since the break-up of Pangea (Foley and Ayuso, 2013). Foley et al. (2014) established that due to their inherent high concentrations of REE, anorogenic (A-type) and highly fractionated igneous (I-type) granitic rocks of southeastern United States were highly prospective source rocks for deposits of this type. More recently, additional studies investigated accumulation processes resulting in high concentrations of REE in granite-derived regolith deposits related to the Stewartsville pluton and other plutons in Virginia. The Stewartsville pluton was emplaced along the flank of the Blue Ridge province during regional crustal extension related to the opening of the Iapetus Ocean and breakup of the supercontinent Rodinia. The studied rock samples consist of medium- to coarse-grained biotite granite and are mineralogically complex. They contain phenocrysts of quartz, sericitized and albitized k-feldspar, sodic plagioclase, and mafic clots and stringers that are composed primarily of biotite and stilpnomelane and, less typically, include magnetite and remnant cores of green and green-brown hornblende. Feldspar contains inclusions of synchysite and fergusonite; other accessory minerals include abundant and diagnostic allanite and fluorite, as well as apatite, epidote, garnet, Nb-rutile, fergusonite, monazite, titanite, xenotime, gadolinite, and zircon (Foley and Ayuso, 2015 and references therein). Granite outcrop exposures in the Piedmont and Blue Ridge areas of Virginia tend to be intensely weathered, with overlying regoliths ranging from thin and discontinuous to meters thick and laterally extensive, and often with overlying B-horizon type soils. Saprolite can extend down to depths of tens of meters below the B-horizon. In the case of the Stewartsville Pluton, regolith is well developed in multiple exposures. The sampled section described in this data release is >20 meters high by >60 meters long. The profile includes nearly fresh rock, partially to highly weathered saprolite, indurated gravels and sands, and poorly delineated layers of subsoil and topsoil. Granite at the base of the profile is iron stained (mostly goethite) and weathered on exposed surfaces and along cracks. Partially weathered sections of the outcrop display a range of rock textures throughout, rather than systematic changes from base to surface. For example, in the lower parts, cobble and boulder-sized relics of spheroidally weathered granite knobs retain distinctive primary textures but are surrounded by nearly disaggregated granite that crumbles to sand and gravel-sized fragments when sampled. Subsoils, mainly B-horizon, comprise the uppermost meter of the section and contain a higher proportion of clay minerals (i.e. kaolinite-nontronite-iron-oxide mixtures) than the underlying saprolite.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset represents a thoroughly transformed and enriched version of a publicly available customer shopping dataset. It has undergone comprehensive processing to ensure it is clean, privacy-compliant, and enriched with new features, making it highly suitable for advanced analytics, machine learning, and business research applications.
The transformation process focused on creating a high-quality dataset that supports robust customer behavior analysis, segmentation, and anomaly detection, while maintaining strict privacy through anonymization and data validation.
➡ Data Cleaning and Preprocessing : Duplicates were removed. Missing numerical values (Age, Purchase Amount, Review Rating) were filled with medians; missing categorical values labeled “Unknown.” Text data were cleaned and standardized, and numeric fields were clipped to valid ranges.
➡ Feature Engineering : New informative variables were engineered to augment the dataset’s analytical power. These include: • Avg_Amount_Per_Purchase: Average purchase amount calculated by dividing total purchase value by the number of previous purchases, capturing spending behavior per transaction. • Age_Group: Categorical age segmentation into meaningful bins such as Teen, Young Adult, Adult, Senior, and Elder. • Purchase_Frequency_Score: Quantitative mapping of purchase frequency to annualized values to facilitate numerical analysis. • Discount_Impact: Monetary quantification of discount application effects on purchases. • Processing_Date: Timestamp indicating the dataset transformation date for provenance tracking.
➡ Data Filtering : Rows with ages outside 0–100 were removed. Only core categories (Clothing, Footwear, Outerwear, Accessories) and the top 25% of high-value customers by purchase amount were retained for focused analysis.
➡ Data Transformation : Key numeric features were standardized, and log transformations were applied to skewed data to improve model performance.
➡ Advanced Features : Created a category-wise average purchase and a loyalty score combining purchase frequency and volume.
➡ Segmentation & Anomaly Detection : Used KMeans to cluster customers into four groups and Isolation Forest to flag anomalies.
➡ Text Processing : Cleaned text fields and added a binary indicator for clothing items.
➡ Privacy : Hashed Customer ID and removed sensitive columns like Location to ensure privacy.
➡ Validation : Automated checks for data integrity, including negative values and valid ranges.
This transformed dataset supports a wide range of research and practical applications, including customer segmentation, purchase behavior modeling, marketing strategy development, fraud detection, and machine learning education. It serves as a reliable and privacy-aware resource for academics, data scientists, and business analysts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.
Key observations
Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains comprehensive stock market data for June 2025, capturing daily trading information across multiple companies and sectors. The dataset represents a substantial collection of market data with detailed financial metrics and trading statistics.
Column Name | Data Type | Description | Example Values |
---|---|---|---|
Date | Date | Trading date in DD-MM-YYYY format | 01-06-2025, 02-06-2025 |
Ticker | String | Stock ticker symbol (3-4 characters) | AAPL, GOOGL, TSLA |
Open Price | Float | Opening price of the stock | 34.92, 206.5, 125.1 |
Attribute | Details |
---|---|
Dataset Name | Stock Market Data - June 2025 |
File Format | CSV |
File Size | ~2.5 MB |
Number of Records | 11,600+ |
Number of Features | 13 |
Time Period | June 1-21, 2025 |
Column Name | Data Type | Description | Example Values |
---|---|---|---|
Date | Date | Trading date in DD-MM-YYYY format | 01-06-2025, 02-06-2025 |
Ticker | String | Stock ticker symbol (3-4 characters) | AAPL, GOOGL, TSLA, SLH |
Open Price | Float | Opening price of the stock | 34.92, 206.5, 125.1 |
Close Price | Float | Closing price of the stock | 34.53, 208.45, 124.03 |
High Price | Float | Highest price during the trading day | 35.22, 210.51, 127.4 |
Low Price | Float | Lowest price during the trading day | 34.38, 205.12, 121.77 |
Volume Traded | Integer | Number of shares traded | 2,966,611, 1,658,738 |
Market Cap | Float | Market capitalization in dollars | 57,381,363,838.88 |
PE Ratio | Float | Price-to-Earnings ratio | 29.63, 13.03, 29.19 |
Dividend Yield | Float | Dividend yield percentage | 2.85, 2.73, 2.64 |
EPS | Float | Earnings per Share | 1.17, 16.0, 4.25 |
52 Week High | Float | Highest price in the last 52 weeks | 39.39, 227.38, 138.35 |
52 Week Low | Float | Lowest price in the last 52 weeks | 28.44, 136.79, 100.69 |
Sector | String | Industry sector classification | Industrials, Energy, Healthcare |
✅ Authentic Price Ranges: Based on realistic 2025 market projections ✅ Sector-Appropriate Volatility: Different volatility patterns by industry ✅ Correlated Metrics: P/E ratios, dividend yields, and EPS align with market caps ✅ Realistic Trading Volumes: Volume scaled appropriately to market cap ✅ Temporal Consistency: Logical price progression over 53-day period ✅ Market Cap Accuracy: Daily fluctuations reflect actual price movements
This dataset provides a comprehensive foundation for quantitative finance research, offering both breadth across market sectors and depth in daily trading dynamics while maintaining statistical realism throughout the observation period...
The ranges of values of the hyperparameters of the benchmark datasets.