MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data types
sequence: 30 datapoints structure: 30 datapoints
Conversion report
Over a total of 30 datapoints, there are:
OUTPUT
ALL: 30 valid datapoints INCLUDED: 6 duplicate sequences with different structure / dms / shape
MODIFIED
0 multiple sequences with the same reference (renamed reference)
FILTERED OUT
0 invalid datapoints (ex: sequence with non-regular characters) 0 datapoints with bad structures 0 duplicate sequences with… See the full description on the dataset page: https://huggingface.co/datasets/rouskinlab/lncRNA.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A new group contribution (GC) quantitative structure-property relationship (QSPR) for estimating density (ρ) of pure ionic liquids (ILs) as a function of temperature (T) and pressure (p) is developed on the basis of the most comprehensive collection of volumetric data reported so far (in total 41 250 data points, deposited for 2267 ILs from diverse chemical families). The model was established based on a carefully revised, evaluated, and reduced data set, whereas the adopted GC methodology follows the approach proposed previously [Ind. Eng. Chem. Res. 2012, 51, 591−604]. However, a novel approach is proposed to model both temperature and pressure dependence. The idea consist of an independent representation of reference density ρ0 at T0 = 298.15 K and ρ0 = 0.1 MPa and dimensionless correction f(T, P) ρ(T, p)/ρ0 for other conditions of temperature and pressure. Three common machine learning algorithms are employed to represent the quantitative structure–property relationship between the studied property end points, GCs, T, and p, namely, multiple linear regression, feed-forward artificial neural network, and least-squares support vector machine. On the basis of detailed statistical analysis of the resulting models, including both internal and external stability checks by means of common statistical procedures such as cross-validation, y-scrambling, and “hold-out” testing, the final model is selected and recommended. An impact of type of cation and anion of the accuracy of calculations is highlighted and discussed. Performance of the new model is finally demonstrated by comparing it with similar methods published recently in the literature.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://imgur.com/AYzsmYU.jpg" alt="Dataset Structure">
I read an article yesterday which got my mind storming, A article by Worldbank on August 15th, 2022 better explains it, It has been quoted below,
I already have a project i'm working on since Feb 2021, trying to solving this problem, listed in my datasets
This dataset showcases the statistics over the past 6-7 decades which covers the production of 150+ unique crops, 50+ livestock elements, Land distribution by usage and population, As aspiring data scientists one can try to extract insights incentivizing the optimal use of natural resources and distribution of resources
Record high food prices have triggered a global crisis that will drive millions more into extreme poverty, magnifying hunger and malnutrition, while threatening to erase hard-won gains in development. The war in Ukraine, supply chain disruptions, and the continued economic fallout of the COVID-19 pandemic are reversing years of development gains and pushing food prices to all-time highs. Rising food prices have a greater impact on people in low- and middle-income countries, since they spend a larger share of their income on food than people in high-income countries. This brief looks at rising food insecurity and World Bank responses to date.
<--- | (❁´◡`❁) | ---> |
---|---|---|
![]() | ![]() | ![]() |
Seeds are a key pathway for plant population recovery following disturbance. To prevent germination during unsuitable conditions, most species produce dormant seeds. In fire-prone regions, physical dormancy (PY) enables seeds to germinate after fire. The thermal niche, incorporating seed dormancy and mortality temperature responses, has not been characterised for PY seeds from fire prone environments. We aimed to assess variation in thermal thresholds between species with PY seeds and if the pyro-thermal niche is aligned with seed mass, ecosystem type or phylogenetic relatedness. We collected post heat-shock germination data for 58 Australian species that produce PY seeds. We applied species-specific thermal performance curves to define three critical thresholds (DRT50, dormancy release temperature; Topt, optimum dormancy release temperature and LT50, lethal temperature), defining the pyro-thermal niche. Each species was assigned a mean seed weight and ecosystem type. We constructed a p..., Species selection and data acquisition. We set out to acquire seed germination data following heat-shock for as many species as possible across temperate Australia. However, to provide accurate estimates of threshold conditions, we applied three rules that allowed data from multiple sources to be brought together; 1) seeds needed to be heated from ~40°C through to at least 120 °C (or higher if there was no evidence of seed mortality at 120°C), 2) seeds had to be heated for either 5 mins and/or 10mins and 3) there was > two non-zero germination data points within the treatments (e.g., seeds treated at 40 °C, 60 °C, 80 °C, 100 °C and 120 °C, but only recorded ~ 0% germination at 80 and 100 would be discarded). Where there was insufficient data points to model the response, we removed these datasets were removed, as fitting response curves to two data points induces significant error into hypothetical responses. However, this was a rare occurrence, and only appeared once across all s..., , # Data from: defining the pyro-thermal niche: do seed traits, ecosystem type and phylogeny influence thermal thresholds in seeds with physical dormancy
[https://doi.org/10.5061/dryad.j9kd51cm3] (https://doi.org/10.5061/dryad.j9kd51cm3)
This dataset provides the full data requirements to implement the "pyrothermal niche" markdown file in R. It also includes model fitting figures for each of the species used in the analysis.
Description of the data and file structure
The script provided calls upon each of the included .csv files and the phylogenetic tree. The bulk of the csv files are subsets and cuts of the two main datasets, germination data and master data.csv. The figures included here are directly associated with the species-specific model selection undertaken in the manuscript. Each figure indicates the model fit of the 7 models used and included suitablity metrics for each model. For furt...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Repository with supplemental data to:
Characterization of alpha and beta interactions in liquid xenon. Jörg, F., Cichon, D., Eurin, G. et al. Eur. Phys. J. C 82, 361 (2022) 10.1140/epjc/s10052-022-10259-3
A pre-print of the article is available on arXiv: 2109.13735
Note: When re-using the data, please make sure to cite the article (and not only the dataset)
The files contain the measured data points (as well as their statistical and systematic uncertainties) as shown in the publication.
All datasets are stored in the .csv format.
Minimum working example to plot the drift velocity using the 83mKr data:
1 import numpy as np
2 import matplotlib.pyplot as plt
3
4 # load the data set
5 data = np.loadtxt("20220427_drift_velocity_hexe_kr83m.csv", delimiter=",")
6
7 # Plot the systematic uncertainty on the drift field
8 plt.errorbar(data[:,0], data[:,2], xerr=data[:,1], fmt="o", capsize=2, ecolor="darkgray",
9 alpha=0.7, elinewidth=3, color="black")
10
11 # Plot the actual data points
12 plt.errorbar(data[:,0], data[:,2], yerr=data[:,3], fmt="o", color="black")
13
14 # Label the axis and define the range
15 plt.ylabel("Drift Velocity [mm/µs]")
16 plt.xlabel("Drift Field [kV/cm]")
17 plt.xscale("log")
18 plt.xlim(0.006, 2)
19 plt.ylim(0, 2.4)
20 plt.show()
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preliminary Dataset<<< As the article has not yet been peer-reviewed. Pre-print available on arXiv: 2109.13735
Repository with supplemental data to: Characterization of alpha and electron interactions in liquid xenon
Note: When re-using the data, please make sure to cite the article (and not only the dataset)
The files contain the measured data points (as well as their statistical and systematic uncertainties) as shown in the publication. All datasets are stored in the .csv format.
20210924_yields_hexe_kr83m.csv This file contains the normalized light and charge yields as a function of the applied field from the measurement with the 83mKr source. The data is shown in Figure 16 (dots) of the publication
20210924_yields_hexe_rn222.csv This file contains the normalized light and charge yields as a function of the applied field from the measurement with the 222Rn source. The data is shown in Figure 17 (blue-ish points) of the publication
20210924_drift_velocity_hexe_rn222.csv This file contains the measured electron drift velocity in liquid xenon at a temperature of 174.4 K in dependence of the field. The data was acquired using the 222Rn source. Drift velocity is given in units of mm/µs and the datapoints are shown in Figure 19 (black dots) of the publication
20210924_drift_velocity_hexe_kr83m.csv This file contains the measured electron drift velocity in liquid xenon at a temperature of 174.4 K in dependence of the field. The data was acquired using the 83mKr source. Drift velocity is given in units of mm/µs and are not displayed in the publications due to visibility reasons.
Minimum working example to plot the drift velocity using the 83mKr data:
1 import numpy as np 2 import matplotlib.pyplot as plt 3 4 # load the data set 5 data = np.loadtxt("20210924_drift_velocity_hexe_kr83m.csv", delimiter=",") 6 7 # Plot the systematic uncertainty on the drift field 8 plt.errorbar(data[:,0], data[:,2], xerr=data[:,1], fmt="o", capsize=2, ecolor="darkgray", 9 alpha=0.7, elinewidth=3, color="black") 10 11 # Plot the actual data points 12 plt.errorbar(data[:,0], data[:,2], yerr=data[:,3], fmt="o", color="black") 13 14 # Label the axis and define the range 15 plt.ylabel("Drift Velocity [mm/µs]") 16 plt.xlabel("Drift Field [kV/cm]") 17 plt.xscale("log") 18 plt.xlim(0.006, 2) 19 plt.ylim(0, 2.4) 20 plt.show()
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data contained in the zip files constitute the main research data of the publication entitled as "Retarded room temperature Hamaker coefficients between bulk elemental metals". They are provided in a txt file format.
"Identical metals in vacuum.zip" contains the room temperature Hamaker coefficients between 26 identical elemental polycrystalline metals that are embedded in vacuum computed from the full Lifshitz theory as a function of the separation of the metallic semi-spaces within 0-200nm. The employed discretization scheme is the following: for l = 0 − 1 nm, (\Delta{l}=0.1\,nm) which corresponds to 11 data points, for l = 1−200 nm: (\Delta{l}=1\,nm) which corresponds to 200 data points. The computation of the imaginary argument dielectric function of metals is based on the full spectral method combined with a Drude model low frequency extrapolation technique which has been implemented with input from extended-in-frequency dielectric data that range from the far infra-red region to the soft X-ray region of the electromagnetic spectrum.
"Identical metals in water (Fiedler et al).zip" contains the room temperature Hamaker coefficients between 26 identical elemental polycrystalline metals that are embedded in pure water computed from the full Lifshitz theory as a function of the separation of the metallic semi-spaces within 0-200nm. The employed discretization scheme is the following: for l = 0 − 1 nm, (\Delta{l}=0.1\,nm) which corresponds to 11 data points, for l = 1−200 nm: (\Delta{l}=1\,nm) which corresponds to 200 data points. The computation of the imaginary argument dielectric function of metals is based on the full spectral method combined with a Drude model low frequency extrapolation technique which has been implemented with input from extended-in-frequency dielectric data that range from the far infra-red region to the soft X-ray region of the electromagnetic spectrum. The computation of the imaginary argument dielectric function of pure water is based on the simple spectral method which has been implemented with input from the Fiedler et al. dielectric parameterization.
"Identical metals in water (Parsegian-Weiss).zip" contains the room temperature Hamaker coefficients between 26 identical elemental polycrystalline metals that are embedded in pure water computed from the full Lifshitz theory as a function of the separation of the metallic semi-spaces within 0-200nm. The employed discretization scheme is the following: for l = 0 − 1 nm, (\Delta{l}=0.1\,nm) which corresponds to 11 data points, for l = 1−200 nm: (\Delta{l}=1\,nm) which corresponds to 200 data points. The computation of the imaginary argument dielectric function of metals is based on the full spectral method combined with a Drude model low frequency extrapolation technique which has been implemented with input from extended-in-frequency dielectric data that range from the far infra-red region to the soft X-ray region of the electromagnetic spectrum. The computation of the imaginary argument dielectric function of pure water is based on the simple spectral method which has been implemented with input from the Parsegian-Weiss dielectric parameterization.
"Identical metals in water (Roth-Lenhoff).zip" contains the room temperature Hamaker coefficients between 26 identical elemental polycrystalline metals that are embedded in pure water computed from the full Lifshitz theory as a function of the separation of the metallic semi-spaces within 0-200nm. The employed discretization scheme is the following: for l = 0 − 1 nm, (\Delta{l}=0.1\,nm) which corresponds to 11 data points, for l = 1−200 nm: (\Delta{l}=1\,nm) which corresponds to 200 data points. The computation of the imaginary argument dielectric function of metals is based on the full spectral method combined with a Drude model low frequency extrapolation technique which has been implemented with input from extended-in-frequency dielectric data that range from the far infra-red region to the soft X-ray region of the electromagnetic spectrum. The computation of the imaginary argument dielectric function of pure water is based on the simple spectral method which has been implemented with input from the Roth-Lenhoff dielectric parameterization.
All Hamaker coefficients are given in zJ and all separations are given in nm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The uploaded data set was generated by the fuzzy model published in T. Galli, F. Chiclana, and F. Siewe. Genetic algorithm-based fuzzy inference system for describing execution tracing quality. Mathematics, 9(21), 2021. ISSN 2227-7390. doi: https://doi.org/10.3390/ma th9212822. URL https://www.mdpi.com/2571-5577/4/1/20.
The goal of the data generation is to make the published model available in the form of data points in a 5D space, which facilitates the construction of simpler models to approximate the original model. The names of the columns in the .csv file constitute the quality properties of execution tracing: (1) accuracy, (2) legibility, (3) implementation, and (4) security, while column (5) contains execution tracing quality derived from the fuzzy model. The indices in brackets show the column indices in the .csv file.
All variables lie in the continuous range [0, 100], where 100 means the best possible quality value and 0 the complete lack of quality or the lack of the given quality property. While generating the data, the inputs were increased by a step-size 5 and the model's output was collected, i.e. 4 inputs, from including 0 to 100 with 21 data points (21^4 = 194481).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database linking crystal structure, Materials project id, and experimental enthalpy of hydride formation in various metals/intermetallics. Information for the source of the experimental value is provided, along with DOI where available. Also, data is labeled according to data_set value, where 1 labels data points used in training, 2 labels data points used for validation, and 3 are data points used in the test. Data_set=0 are data points not used in model development. In addition, the model and scaler are provided. More details can be found in K.Batalovic et al., 'Predicting heat of hydride formation by the graph neural network – exploring structure-property relation for metal hydrides '.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo analyze the influence path of the interaction between the unit environment, achievement transformation willingness, and achievement transformation cognition on achievement transformation output to provide a basis for optimizing the achievement transformation environment of medical and health institutions and improving the efficiency of scientific and technological achievements transformation.MethodsThrough the questionnaire survey, 292 data points were obtained. SPSS20.0 was used to conduct cross-table chi-square analysis and binary logistic regression analysis on the willingness, cognition, and output of scientific and technological achievements transformation. The process 14.0 plug-in is used to analyze the mediating effect of transformation cognition and the moderating effect of the unit environment.ResultsAchievement transformation willingness has a significant positive impact on achievement transformation cognition and achievement transformation output. Achievement transformation cognition has a positive impact on achievement transformation output, and the mediating effect of transformation cognition is significant and partial. The unit environment has a negative moderating effect on the influence of achievement transformation willingness on achievement transformation cognition, and a positive moderating effect on the influence of achievement transformation willingness on achievement transformation output.ConclusionPersonal factors and unit achievement transformation environment have an obvious influence on the willingness and cognition of achievement transformation. It is necessary to optimize the environment of unit scientific and technological achievements transformation, improve the policy system of benefiting from the achievements of scientific and technological achievements transformation of medical and health personnel, implement the disposal right, income right and distribution right of scientific and technological achievements transformation of medical and health institutions, and stimulate the enthusiasm of scientific and technological achievements transformation of medical and health personnel.
Mass distributions for selected $D^0 D^0$ candidates with the $D^0$ background subtracted. Uncertainties on the data points are statistical only...
The household incomes chart shows how many household fall in each of the income brackets specified by Statistics Canada.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CESNET-TimeSeries24: The dataset for network traffic forecasting and anomaly detection
The dataset called CESNET-TimeSeries24 was collected by long-term monitoring of selected statistical metrics for 40 weeks for each IP address on the ISP network CESNET3 (Czech Education and Science Network). The dataset encompasses network traffic from more than 275,000 active IP addresses, assigned to a wide variety of devices, including office computers, NATs, servers, WiFi routers, honeypots, and video-game consoles found in dormitories. Moreover, the dataset is also rich in network anomaly types since it contains all types of anomalies, ensuring a comprehensive evaluation of anomaly detection methods.Last but not least, the CESNET-TimeSeries24 dataset provides traffic time series on institutional and IP subnet levels to cover all possible anomaly detection or forecasting scopes. Overall, the time series dataset was created from the 66 billion IP flows that contain 4 trillion packets that carry approximately 3.7 petabytes of data. The CESNET-TimeSeries24 dataset is a complex real-world dataset that will finally bring insights into the evaluation of forecasting models in real-world environments.
Please cite the usage of our dataset as:
Koumar, J., Hynek, K., Čejka, T. et al. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Sci Data 12, 338 (2025). https://doi.org/10.1038/s41597-025-04603-x@Article{cesnettimeseries24, author={Koumar, Josef and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and {\v{S}}i{\v{s}}ka, Pavel}, title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting}, journal={Scientific Data}, year={2025}, month={Feb}, day={26}, volume={12}, number={1}, pages={338}, issn={2052-4463}, doi={10.1038/s41597-025-04603-x}, url={https://doi.org/10.1038/s41597-025-04603-x}}
Time series
We create evenly spaced time series for each IP address by aggregating IP flow records into time series datapoints. The created datapoints represent the behavior of IP addresses within a defined time window of 10 minutes. The vector of time-series metrics v_{ip, i} describes the IP address ip in the i-th time window. Thus, IP flows for vector v_{ip, i} are captured in time windows starting at t_i and ending at t_{i+1}. The time series are built from these datapoints.
Datapoints created by the aggregation of IP flows contain the following time-series metrics:
Simple volumetric metrics: the number of IP flows, the number of packets, and the transmitted data size (i.e. number of bytes)
Unique volumetric metrics: the number of unique destination IP addresses, the number of unique destination Autonomous System Numbers (ASNs), and the number of unique destination transport layer ports. The aggregation of \textit{Unique volumetric metrics} is memory intensive since all unique values must be stored in an array. We used a server with 41 GB of RAM, which was enough for 10-minute aggregation on the ISP network.
Ratios metrics: the ratio of UDP/TCP packets, the ratio of UDP/TCP transmitted data size, the direction ratio of packets, and the direction ratio of transmitted data size
Average metrics: the average flow duration, and the average Time To Live (TTL)
Multiple time aggregation: The original datapoints in the dataset are aggregated by 10 minutes of network traffic. The size of the aggregation interval influences anomaly detection procedures, mainly the training speed of the detection model. However, the 10-minute intervals can be too short for longitudinal anomaly detection methods. Therefore, we added two more aggregation intervals to the datasets--1 hour and 1 day.
Time series of institutions: We identify 283 institutions inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution's data.
Time series of institutional subnets: We identify 548 institution subnets inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution subnet's data.
Data Records
The file hierarchy is described below:
cesnet-timeseries24/
|- institution_subnets/
| |- agg_10_minutes/.csv
| |- agg_1_hour/.csv
| |- agg_1_day/.csv
| |- identifiers.csv
|- institutions/
| |- agg_10_minutes/.csv
| |- agg_1_hour/.csv
| |- agg_1_day/.csv
| |- identifiers.csv
|- ip_addresses_full/
| |- agg_10_minutes//.csv
| |- agg_1_hour//.csv
| |- agg_1_day//.csv
| |- identifiers.csv
|- ip_addresses_sample/
| |- agg_10_minutes/.csv
| |- agg_1_hour/.csv
| |- agg_1_day/.csv
| |- identifiers.csv
|- times/
| |- times_10_minutes.csv
| |- times_1_hour.csv
| |- times_1_day.csv
|- ids_relationship.csv |- weekends_and_holidays.csv
The following list describes time series data fields in CSV files:
id_time: Unique identifier for each aggregation interval within the time series, used to segment the dataset into specific time periods for analysis.
n_flows: Total number of flows observed in the aggregation interval, indicating the volume of distinct sessions or connections for the IP address.
n_packets: Total number of packets transmitted during the aggregation interval, reflecting the packet-level traffic volume for the IP address.
n_bytes: Total number of bytes transmitted during the aggregation interval, representing the data volume for the IP address.
n_dest_ip: Number of unique destination IP addresses contacted by the IP address during the aggregation interval, showing the diversity of endpoints reached.
n_dest_asn: Number of unique destination Autonomous System Numbers (ASNs) contacted by the IP address during the aggregation interval, indicating the diversity of networks reached.
n_dest_port: Number of unique destination transport layer ports contacted by the IP address during the aggregation interval, representing the variety of services accessed.
tcp_udp_ratio_packets: Ratio of packets sent using TCP versus UDP by the IP address during the aggregation interval, providing insight into the transport protocol usage pattern. This metric belongs to the interval <0, 1> where 1 is when all packets are sent over TCP, and 0 is when all packets are sent over UDP.
tcp_udp_ratio_bytes: Ratio of bytes sent using TCP versus UDP by the IP address during the aggregation interval, highlighting the data volume distribution between protocols. This metric belongs to the interval <0, 1> with same rule as tcp_udp_ratio_packets.
dir_ratio_packets: Ratio of packet directions (inbound versus outbound) for the IP address during the aggregation interval, indicating the balance of traffic flow directions. This metric belongs to the interval <0, 1>, where 1 is when all packets are sent in the outgoing direction from the monitored IP address, and 0 is when all packets are sent in the incoming direction to the monitored IP address.
dir_ratio_bytes: Ratio of byte directions (inbound versus outbound) for the IP address during the aggregation interval, showing the data volume distribution in traffic flows. This metric belongs to the interval <0, 1> with the same rule as dir_ratio_packets.
avg_duration: Average duration of IP flows for the IP address during the aggregation interval, measuring the typical session length.
avg_ttl: Average Time To Live (TTL) of IP flows for the IP address during the aggregation interval, providing insight into the lifespan of packets.
Moreover, the time series created by re-aggregation contains following time series metrics instead of n_dest_ip, n_dest_asn, and n_dest_port:
sum_n_dest_ip: Sum of numbers of unique destination IP addresses.
avg_n_dest_ip: The average number of unique destination IP addresses.
std_n_dest_ip: Standard deviation of numbers of unique destination IP addresses.
sum_n_dest_asn: Sum of numbers of unique destination ASNs.
avg_n_dest_asn: The average number of unique destination ASNs.
std_n_dest_asn: Standard deviation of numbers of unique destination ASNs)
sum_n_dest_port: Sum of numbers of unique destination transport layer ports.
avg_n_dest_port: The average number of unique destination transport layer ports.
std_n_dest_port: Standard deviation of numbers of unique destination transport layer ports.
Moreover, files identifiers.csv in each dataset type contain IDs of time series that are present in the dataset. Furthermore, the ids_relationship.csv file contains a relationship between IP addresses, Institutions, and institution subnets. The weekends_and_holidays.csv contains information about the non-working days in the Czech Republic.
The data is a synthetic univariate time series.
This data set is designed for testing indexing schemes in time seriesdatabases. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing tasks.
This data set is designed for testing indexing schemes in time series databases. It is a much larger dataset than has been used in any published study (That we are currently aware of). It contains one million data points. The data has been split into 10 sections to facilitate testing (see below). We recommend building the index with 9 of the 100,000-datapoint sections, and randomly extracting a query shape from the 10th section. (Some previously published work seems to have used queries that were also used to build the indexing structure. This will produce optimistic results) The data are interesting because they have structure at different resolutions. Each of the 10 sections where generated by independent invocations of the function:https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3650646%2F63a7467c9c096ba461b6f02702e6d816%2Fequation.jpg?generation=1598371655944726&alt=media" alt="">
Where rand(x) produces a random integer between zero and x. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing structure.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
The data is stored in one ASCII file. There are 10 columns, 100,000 rows. All data points are in the range -0.5 to +0.5. Rows are separated by carriage returns, columns by spaces.
Acknowledgements, Copyright Information, and Availability.Freely available for research use.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains raw, unprocessed data files pertaining to the management tool 'Benchmarking'. The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "benchmarking" + "benchmarking management" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Benchmarking Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: "benchmarking" AND ("process" OR "management" OR "performance" OR "best practices" OR "implementation" OR "approach" OR "evaluation" OR "methodology") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Benchmarking (1993, 1996, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 1994, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1996/784; 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Benchmarking (1993, 1996, 1999, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014, 2017). Respondent Profile: CEOs, CFOs, COOs, other senior leaders; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., et al., various years: 1994, 2001, 2003, 2005, 2007, 2009, 2011, 2013, 2015, 2017). Note: Tool not included in the 2022 survey data. Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 1993/500; 1996/784; 1999/475; 2000/214; 2002/708; 2004/960; 2006/1221; 2008/1430; 2010/1230; 2012/1208; 2014/1067; 2017/1268. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.
Population is the sum of births plus in-migration, and it signifies the total market size possible in the area. This is an important metric for economic developers to measure their economic health and investment attraction. Businesses also use this as a metric for market size when evaluating startup, expansion or relocation decisions.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides a comprehensive view of the aging process of lithium-ion batteries, facilitating the estimation of their Remaining Useful Life (RUL). Originally sourced from NASA's open repository, the dataset has undergone meticulous preprocessing to enhance its analytical utility. The data is presented in a user-friendly CSV format after extracting relevant features from the original .mat
files.
Battery Performance Metrics:
Environmental Conditions:
Identification Attributes:
Processed Data:
Labels:
Battery Health Monitoring:
Data Science and Machine Learning:
Research and Development:
The dataset was retrieved from NASA's publicly available data repositories. It has been preprocessed to align with research and industrial standards for usability in analytical tasks.
Leverage this dataset to enhance your understanding of lithium-ion battery degradation and build models that could revolutionize energy storage solutions.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains raw, unprocessed data files pertaining to the management tool 'Zero-Based Budgeting' (ZBB), including related concepts like Priority Based Budgeting. The data originates from five distinct sources, each reflecting different facets of the tool's prominence and usage over time. Files preserve the original metrics and temporal granularity before any comparative normalization or harmonization. Data Sources & File Details: Google Trends File (Prefix: GT_): Metric: Relative Search Interest (RSI) Index (0-100 scale). Keywords Used: "zero based budgeting" + "priority based budgeting" + "zero based budgeting management" Time Period: January 2004 - January 2025 (Native Monthly Resolution). Scope: Global Web Search, broad categorization. Extraction Date: Data extracted January 2025. Notes: Index relative to peak interest within the period for these terms. Reflects public/professional search interest trends. Based on probabilistic sampling. Source URL: Google Trends Query Google Books Ngram Viewer File (Prefix: GB_): Metric: Annual Relative Frequency (% of total n-grams in the corpus). Keywords Used: Zero Based Budgeting + Priority Based Budgeting + Program Budgeting Time Period: 1950 - 2022 (Annual Resolution). Corpus: English. Parameters: Case Insensitive OFF, Smoothing 0. Extraction Date: Data extracted January 2025. Notes: Reflects term usage frequency in Google's digitized book corpus. Subject to corpus limitations (English bias, coverage). Source URL: Ngram Viewer Query Crossref.org File (Prefix: CR_): Metric: Absolute count of publications per month matching keywords. Keywords Used: ("zero based budgeting" OR "priority based budgeting" OR "program budgeting") AND ("management" OR "financial" OR "budgeting process" OR "planning" OR "control" OR "system") Time Period: 1950 - 2025 (Queried for monthly counts based on publication date metadata). Search Fields: Title, Abstract. Extraction Date: Data extracted January 2025. Notes: Reflects volume of relevant academic publications indexed by Crossref. Deduplicated using DOIs; records without DOIs omitted. Source URL: Crossref Search Query Bain & Co. Survey - Usability File (Prefix: BU_): Metric: Original Percentage (%) of executives reporting tool usage. Tool Names/Years Included: Zero-Based Budgeting (2012, 2014, 2017, 2022). Respondent Profile: CEOs, CFOs, COOs, other senior leaders from multinational corporations and medium-sized enterprises across strategy, marketing, HR, etc.; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., Ronan C. et al., various years: 2013, 2015, 2017, 2023). Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 2012/1208; 2014/1067; 2017/1268; 2022/1068. Bain & Co. Survey - Satisfaction File (Prefix: BS_): Metric: Original Average Satisfaction Score (Scale 0-5). Tool Names/Years Included: Zero-Based Budgeting (2012, 2014, 2017, 2022). Respondent Profile: CEOs, CFOs, COOs, other senior leaders from multinational corporations and medium-sized enterprises across strategy, marketing, HR, etc.; global, multi-sector. Source: Bain & Company Management Tools & Trends publications (Rigby D., Bilodeau B., Ronan C. et al., various years: 2013, 2015, 2017, 2023). Data Compilation Period: July 2024 - January 2025. Notes: Data points correspond to specific survey years. Sample sizes: 2012/1208; 2014/1067; 2017/1268; 2022/1068. Reflects subjective executive perception of utility. File Naming Convention: Files generally follow the pattern: PREFIX_Tool.csv, where the PREFIX indicates the data source: GT_: Google Trends GB_: Google Books Ngram CR_: Crossref.org (Count Data for this Raw Dataset) BU_: Bain & Company Survey (Usability) BS_: Bain & Company Survey (Satisfaction) The essential identification comes from the PREFIX and the Tool Name segment. This dataset resides within the 'Management Tool Source Data (Raw Extracts)' Dataverse.
By US Open Data Portal, data.gov [source]
This dataset provides a list of all Home Health Agencies registered with Medicare. Contained within this dataset is information on each agency's address, phone number, type of ownership, quality measure ratings and other associated data points. With this valuable insight into the operations of each Home Health Care Agency, you can make informed decisions about your care needs. Learn more about the services offered at each agency and how they are rated according to their quality measure ratings. From dedicated nursing care services to speech pathology to medical social services - get all the information you need with this comprehensive look at U.S.-based Home Health Care Agencies!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Are you looking to learn more about Home Health Care Agencies registered with Medicare? This dataset can provide quality measure ratings, addresses, phone numbers, types of services offered and other information that may be helpful when researching Home Health Care Agencies.
This guide will explain how to use the data in this dataset to gain a better understanding of Home Health Care Agencies registered with Medicare.
First, you will need to become familiar with the columns in the dataset. A list of all columns and their associated descriptions is provided above for your reference. Once you understand each column’s purpose, it will be easier for you to decide what metrics or variables are most important for your own research.
Next, use this data to compare various facets between different Home Health Care Agencies such as type of ownership, services offered and quality measure ratings like star rating or CMS certification number (from 0-5 stars). Collecting information from multiple sources such as public reviews or customer feedback can help supplement these numerical metrics in order to paint a more accurate picture about each agency's performance and customer satisfaction level.
Finally once you have collected enough data points on one particular agency or a comparison between multiple agencies then conduct more analysis using statistical methods like correlation matrices in order to determine any patterns that exist within the data set which may reveal valuable insights into topic of research at hand
- Using the data to compare quality of care ratings between agencies, so people can make better informed decisions about which agency to hire for home health services.
- Analyzing the costs associated with different types of home health care services, such as nursing care and physical therapy, in order to determine where money could be saved in health care budgets.
- Evaluating the performance of certain agencies by analyzing the number of episodes billed to Medicare compared to their national averages, allowing agencies with lower numbers of billing episodes to be identified and monitored more closely if necessary
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: csv-1.csv | Column name | Description | |:----------------------------------------...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data types
sequence: 30 datapoints structure: 30 datapoints
Conversion report
Over a total of 30 datapoints, there are:
OUTPUT
ALL: 30 valid datapoints INCLUDED: 6 duplicate sequences with different structure / dms / shape
MODIFIED
0 multiple sequences with the same reference (renamed reference)
FILTERED OUT
0 invalid datapoints (ex: sequence with non-regular characters) 0 datapoints with bad structures 0 duplicate sequences with… See the full description on the dataset page: https://huggingface.co/datasets/rouskinlab/lncRNA.