https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset represents real-time data collected from strategically placed microphones and sensors in a music education platform. It contains information about students' performances during music lessons and the associated teaching effectiveness. The dataset aims to simulate the analysis of musical performances, including the accuracy of pitch, rhythm, and dynamics, as well as the evaluation of student engagement and teacher feedback.
Features: Timestamp:
Description: The exact date and time when the data was collected. Type: DateTime Example: 2024-12-20 10:00:00 Sensor ID:
Description: Unique identifier for each microphone or sensor monitoring the performance. Type: Categorical Example: Sensor_001, Sensor_002 Student ID:
Description: Identifier for the student whose performance is being monitored. Type: Categorical Example: Student_001, Student_002 Instrument Type:
Description: The type of musical instrument being played during the lesson. Type: Categorical Example: Piano, Guitar, Violin Pitch (Hz):
Description: The frequency (in Hertz) of the sound produced by the instrument. Type: Numerical (Continuous) Example: 440 Hz (A4 note) Rhythm (BPM):
Description: The tempo of the music being played, measured in beats per minute (BPM). Type: Numerical (Continuous) Example: 120 BPM Dynamics (dB):
Description: The loudness or intensity of the sound produced by the instrument, measured in decibels (dB). Type: Numerical (Continuous) Example: 75 dB Note Duration (s):
Description: The length of time for which each note is held during the performance. Type: Numerical (Continuous) Example: 0.5 seconds Pitch Accuracy (%):
Description: The accuracy with which the pitch produced matches the intended pitch, expressed as a percentage. Type: Numerical (Continuous) Example: 95% Rhythm Accuracy (%):
Description: The accuracy with which the rhythm (tempo and timing) matches the intended pattern, expressed as a percentage. Type: Numerical (Continuous) Example: 100% Teaching Effectiveness Rating:
Description: A rating given to evaluate the effectiveness of the teacher’s instruction based on student performance. Type: Categorical (Ordinal) Example: 5/5, 4/5 Lesson Type:
Description: The type of lesson or session being conducted (e.g., beginner, advanced, or practice). Type: Categorical Example: Beginner Lesson, Advanced Lesson, Practice Session Student Engagement Level:
Description: The level of student engagement during the lesson, measured as either High, Medium, or Low. Type: Categorical Example: High, Medium, Low Teacher Feedback:
Description: Feedback provided by the teacher based on the student's performance. Type: Categorical Example: Good rhythm, Needs improvement, Excellent performance Environmental Factors:
Description: The environmental conditions in which the lesson is conducted, which could influence the quality of the performance data (e.g., background noise). Type: Categorical Example: Quiet, Slight Background Noise, Noisy Student Progress (%):
Description: A measure of student progress over time, expressed as a percentage of improvement in skills. Type: Numerical (Continuous) Example: 85%, 60% Target (Performance Evaluation):
Description: A classification based on performance score. Students are classified as either High Performance or Low Performance based on their pitch and rhythm accuracy. Type: Categorical Example: High Performance, Low Performance Target Column Definition: Performance Score: The average of pitch and rhythm accuracy. If the performance score exceeds 90%, the student is classified as High Performance; otherwise, they are classified as Low Performance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We consider a neighborhood random walk on a quadrant {(X1(t),X2(t),φ(t)):t≥0} with environment phase variable φ(t) modeled by a continuous-time Markov chain with φ(t)∈Snm when X1(t) = n, X2(t) = m. We describe this random walk using a two-dimensional level-dependent Quasi-Birth-and-Death process (2D-LD-QBD) with phase variable φ(t) and level variables X1(t),X2(t)∈{0,1,2,…} which change in a skip-free manner at the moments of jump in the process. We transform this random walk into a one-dimensional LD-QBD {(Z(t),χ(t)):t≥0} with level variable Z(t)∈{0,1,2,…} recording the maximum of the two level variables and phase variable χ(t)=(χ1(t),χ2(t),φ(t)) recording the remaining information about the random walk. Using this transformation, we perform transient and stationary analysis of the random walk, including first hitting times for various sample paths, using matrix-analytic methods. We also construct a sequence of neighborhood random walks, represented as two-dimensional QBDs ({(X1(k)(t),X2(k)(t),φ(t)):t≥0})k=1,2,…, converging in distribution to a two-dimensional stochastic fluid model (SFM) {(Y1(t),Y2(t),φ(t)):t≥0}, which describes a movement on a quadrant in which the position changes in a continuous manner according to rates dY1(t)/dt=c1,φ(t) and dY2(t)/dt=c2,φ(t) modulated by the underlying phase process {φ(t):t≥0}. Numerical examples are provided to illustrate the application of the methodology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Table 2. Glossary of the terms used in Xper3.
Term | Definition |
---|---|
applicability or inapplicability | Status of a descriptor in a description. A descriptor is applicable in a description if the description does not contain an eliminatory state as an attribute value for the parent descriptor. |
attribute | Data associating an item and a descriptor. In the case of a categorical descriptor, an attribute can be one or a set of states if expressing polymorphism is needed for the item. In the case of a numerical descriptor, the attribute is a statistical distribution of values expressed by its minimum and maximum or by its mean and standard deviation. Syn. element description. |
calculated descriptor | A descriptor whose values are automatically calculated from other data using logical (Boolean) operators. |
categorical descriptor | A descriptor taking as possible values qualitative states, or numeric intervals. Examples: “Blade shape”; “Number of antenna articles” with states “less than five”; “five to ten”; “more than ten”. A descriptor that takes qualitative states or numerical intervals as possible values. Examples: “blade shape”; “number of antennae articles” with states “less than five”; “five to ten”; “more than ten”. |
checkbase | A tool to check the consistency of the descriptions and to compare them in pairs. |
description | Set of attributes relating to the same item. |
descriptive model | Set of descriptors with their relationships (hierarchy) and their list of states or metrics. The descriptive model defines the common terminology for all the descriptions in a knowledge base. |
descriptor | Element used to describe items, essentially a character for taxonomists, or a trait for ecologists. |
discriminant power | Ability of a descriptor to distinguish between items. |
elimination state | State used in a rule to define the condition of inapplicability of a descriptor. |
extension | Volume of the representation space corresponding to the description of an item. It is therefore the set of single descriptions (without variability, one value per descriptor) compatible with the item description. |
group | Grouping, under a chosen heading, a set of objects of the same nature; a set of descriptors, or a set of items. |
Ikey+ | Web service for generating a single-access identification key. |
item | Object which can be described in a knowledge base. For taxonomists, it usually is a taxon. |
key | A practical means to identify a specimen. We distinguish the interactive key (see Mkey+) and single access key (see IKey+). |
knowledge base | Dataset in an Xper3 database. |
Mkey+ | Web service for multi-access interactive identification key. |
numerical descriptor | A descriptor expressing a continuous numerical value, with a measurement unit, e.g., antenna length in mm. |
parent descriptor | A descriptor used to define the rule of inapplicability of another descriptor (child descriptor). |
state (categorical state) | A possible value of categorical descriptors, e.g., for a descriptor named “Colour of the eye”, its states could be “Blue”, “Black”, etc. |
unknown value | Special value to express ignorance. In case of an unknown value, Xper3 considers all the states or all the numerical values as equally possible. |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set downloaded from survey administered via Google Forms. Qualitative data was converted into dummy variables. For example, respondent choosing an answer of "I took USMLE Step 1" is assigned a "1" (one) and someone choosing with "I did not take USMLE Step 1" is assigned a "0" (zero). Numerical/continuous data available as downloaded.
Two broadband seismometers were installed on the 4100 level and recorded for the duration of EGS Collab Experiment #2. Inspired by published data from similar instruments installed in the Aspo Hard Rock Lab, these long-period instruments aimed to measure the tilting of the drift in response to the injection of fluid into the testbed. One instrument was installed underneath the wellheads in Site A (aka the "battery" alcove) and the other was installed along the east wall of the drift, south of Site B. Due to the feet of gravel (ballast) laid along the floor of the drift, we were unable to anchor the sensors directly to the rock. As a result, the coupling of the sensors to the experiment rock volume is likely poor. In addition, there are a number of noise sources that complicate the interpretation of the data. For example, sensor BBB is installed adjacent (within 3 ft) to the rail line that runs towards the Ross shaft. Trains (motors) run along this line almost daily and produce a large signal in these data. Careful extraction of periods of interest, as well as filtering for specific signals, is necessary. The sensors are Nanometrics Trillium Compact Posthole seismometers, sensitive down to 120 seconds period. They were installed as close to the drift wall and as deep as we could manually excavate (only about 1 ft or so). The holes were leveled with sand and the sensors were placed on a paver before backfilling with sand. The hole was then covered by a bucket filled with insulation to improve the sensor's isolation from daily temperature variations, which are minor but present due to drift ventilation from the surface. Data were recorded on Nanometrics Centaur digitizers at 100 Hz. The full response information is available in the StationXML file provided here, or by querying the sensors through the IRIS DMC (see links below). These instruments were provided free of charge through the IRIS PASSCAL instrument center. The network code is XP and the station codes are BBA and BBB. The waveform data can be queried through the IRIS FDSN server using any method the user likes. One convenient option is to use the Obspy python package: https://docs.obspy.org/packages/obspy.clients.fdsn.html
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
1.State-and-transition simulation models (STSMs) provide a general framework for forecasting landscape dynamics, including projections of both vegetation and land-use/land-cover (LULC) change. The STSM method divides a landscape into spatially-referenced cells and then simulates the state of each cell forward in time, as a discrete-time stochastic process using a Monte Carlo approach, in response to any number of possible transitions. A current limitation of the STSM method, however, is that all of the state variables must be discrete.
2.Here we present a new approach for extending a STSM, in order to account for continuous state variables, called a state-and-transition simulation model with stocks and flows (STSM-SF). The STSM-SF method allows for any number of continuous stocks to be defined for every spatial cell in the STSM, along with a suite of continuous flows specifying the rates at which stock levels change over time. The change in the level of each stock is then simulated forward in time, for each spatial cell, as a discrete-time stochastic process. The method differs from the traditional systems dynamics approach to stock-flow modelling in that the stocks and flows can be spatially-explicit, and the flows can be expressed as a function of the STSM states and transitions.
3.We demonstrate the STSM-SF method by integrating a spatially-explicit carbon (C) budget model with a STSM of LULC change for the state of Hawai'i, USA. In this example, continuous stocks are pools of terrestrial C, while the flows are the possible fluxes of C between these pools. Importantly, several of these C fluxes are triggered by corresponding LULC transitions in the STSM. Model outputs include changes in the spatial and temporal distribution of C pools and fluxes across the landscape in response to projected future changes in LULC over the next 50 years.
4.The new STSM-SF method allows both discrete and continuous state variables to be integrated into a STSM, including interactions between them. With the addition of stocks and flows, STSMs provide a conceptually simple yet powerful approach for characterizing uncertainties in projections of a wide range of questions regarding landscape change.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The compressed data.zip file contains an example of order OD data generated by taxi operations in Xiamen (using data generated on June 18, 2020 as an example).The zip file code.zip contains a paper under review (" Continuous Flows in Multi-Temporal Mobility Networks: A New Method for Detecting High-Order Spatio Time Patterns in OD Data ").In the compressed code.zip file, the file Extraction_of_STCFs.py is used to extract STCF from multiple csv files containing the order data, and the file Extraction_of_STCFs.py is used to calculate various metrics for STCF and LOOP in the article. The file Frequent_pattern_mining_on_STCFs.py is used to mine frequent patterns from a large number of STCFs.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Despite being the objects of numerous macroevolutionary studies, many of the best-represented constituents of the fossil record—including diverse examples such as foraminifera, brachiopods, and mollusks—have mineralized skeletons with limited discrete characteristics, making morphological phylogenies difficult to construct. In contrast to their paucity of phylogenetic characters, the mineralized structures (tests and shells) of these fossil groups frequently have distinctive shapes that have long proved useful for their classification. The recent introduction of methodologies for including continuous data directly in a phylogenetic analysis has increased the number of available characters, making it possible to produce phylogenies based in whole or part on continuous character data collected from such taxa. Geometric morphometric methods provide tools for accurately characterizing shape variation and can produce quantitative data that can therefore now be included in a phylogenetic matrix in a non-arbitrary manner. Here, the marine gastropod genus Conus is used to evaluate the ability of continuous characters—generated from a geometric morphometric analysis of shell shape—to contribute to a total evidence phylogenetic hypothesis constructed using molecular and morphological data. Furthermore, the ability of continuous characters derived from geometric morphometric analyses to place fossil taxa with limited discrete characters into a phylogeny with their extant relatives was tested by simulating the inclusion of fossil taxa. This was done by removing the molecular partition of individual extant species to produce a “cladistic pseudofossil” with only the geometric morphometric derived characters coded. The phylogenetic position of each cladistic pseudofossil taxon was then compared with its placement in the total evidence tree and a symmetric resampling tree to evaluate the degree to which morphometric characters alone can correctly place simulated fossil species. In 33-45% of the test cases (depending upon the approach used for measuring success), it was possible to place the pseudofossil taxon into the correct regions of the phylogeny using only the morphometric characters. This suggests that the incorporation of extinct Conus taxa into phylogenetic hypotheses will be possible, permitting a wide range of macroevolutionary questions to be addressed within this genus. This methodology also has potential to contribute to phylogenetic reconstructions for other major components of the fossil record that lack numerous discrete characters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Can machine learning effectively lower the effort necessary to extract important information from raw data for hydrological research questions? On the example of a typical water-management task, the extraction of direct runoff flood events from continuous hydrographs, we demonstrate how machine learning can be used to automate the application of expert knowledge to big data sets and extract the relevant information. In particular, we tested seven different algorithms to detect event beginning and end solely from a given excerpt from the continuous hydrograph. First, the number of required data points within the excerpts as well as the amount of training data has been determined. In a local application, we were able to show that all applied Machine learning algorithms were capable to reproduce manually defined event boundaries. Automatically delineated events were afflicted with a relative duration error of 20 and 5% event volume. Moreover, we could show that hydrograph separation patterns could easily be learned by the algorithms and are regionally and trans-regionally transferable without significant performance loss. Hence, the training data sets can be very small and trained algorithms can be applied to new catchments lacking training data. The results showed the great potential of machine learning to extract relevant information efficiently and, hence, lower the effort for data preprocessing for water management studies. Moreover, the transferability of trained algorithms to other catchments is a clear advantage to common methods.
Abalone is common name for any group of small to very large sea snails, commonly found along the coasts across the world, and used as delicacy in cusinies and it's leftover shell is fashioned into jewelery due to it's iridescent luster. Due to it's demand and economic value it's often harvested in farms, and as such the need to predict the age of abalone from physical measurements. Traditional approach to determine it's age is by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task.
From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).
Number of instances: 4177
Number of attributes: 8
Features: Sex, Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight, and Shell weight
Target: Rings
Note: Number of rings is the value to predict: either as a continuous value or it can be converted to classification problem.
Given below is attribute name, type, measurement, and brief description.
Name Data Type Meas. Description
----- --------- ----- -----------
Sex nominal M, F, and I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years
None
Dataset comes from UCI Machine Learning repository: https://archive.ics.uci.edu/ml/datasets/Abalone
This dataset, termed "GAGES II", an acronym for Geospatial Attributes of Gages for Evaluating Streamflow, version II, provides geospatial data and classifications for 9,322 stream gages maintained by the U.S. Geological Survey (USGS). It is an update to the original GAGES, which was published as a Data Paper on the journal Ecology's website (Falcone and others, 2010b) in 2010. The GAGES II dataset consists of gages which have had either 20+ complete years (not necessarily continuous) of discharge record since 1950, or are currently active, as of water year 2009, and whose watersheds lie within the United States, including Alaska, Hawaii, and Puerto Rico. Reference gages were identified based on indicators that they were the least-disturbed watersheds within the framework of broad regions, based on 12 major ecoregions across the United States. Of the 9,322 total sites, 2,057 are classified as reference, and 7,265 as non-reference. Of the 2,057 reference sites, 1,633 have (through 2009) 20+ years of record since 1950. Some sites have very long flow records: a number of gages have been in continuous service since 1900 (at least), and have 110 years of complete record (1900-2009) to date. The geospatial data include several hundred watershed characteristics compiled from national data sources, including environmental features (e.g. climate – including historical precipitation, geology, soils, topography) and anthropogenic influences (e.g. land use, road density, presence of dams, canals, or power plants). The dataset also includes comments from local USGS Water Science Centers, based on Annual Data Reports, pertinent to hydrologic modifications and influences. The data posted also include watershed boundaries in GIS format. This overall dataset is different in nature to the USGS Hydro-Climatic Data Network (HCDN; Slack and Landwehr 1992), whose data evaluation ended with water year 1988. The HCDN identifies stream gages which at some point in their history had periods which represented natural flow, and the years in which those natural flows occurred were identified (i.e. not all HCDN sites were in reference condition even in 1988, for example, 02353500). The HCDN remains a valuable indication of historic natural streamflow data. However, the goal of this dataset was to identify watersheds which currently have near-natural flow conditions, and the 2,057 reference sites identified here were derived independently of the HCDN. A subset, however, noted in the BasinID worksheet as “HCDN-2009”, has been identified as an updated list of 743 sites for potential hydro-climatic study. The HCDN-2009 sites fulfill all of the following criteria: (a) have 20 years of complete and continuous flow record in the last 20 years (water years 1990-2009), and were thus also currently active as of 2009, (b) are identified as being in current reference condition according to the GAGES-II classification, (c) have less than 5 percent imperviousness as measured from the NLCD 2006, and (d) were not eliminated by a review from participating state Water Science Center evaluators. The data posted here consist of the following items:- This point shapefile, with summary data for the 9,322 gages.- A zip file containing basin characteristics, variable definitions, and a more detailed report.- A zip file containing shapefiles of basin boundaries, organized by classification and aggregated ecoregion.- A zip file containing mainstem stream lines (Arc line coverages) for each gage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Malaysia Imports: Other Lifting, Handling, Loading or Unloading Machinery (For Example, Lifts, Continuous-Action Elevators and Conveyors, For Goods or Materials:Other, Bucket Type: data was reported at 0.495 MYR mn in Mar 2025. This records an increase from the previous number of 0.243 MYR mn for Feb 2025. Malaysia Imports: Other Lifting, Handling, Loading or Unloading Machinery (For Example, Lifts, Continuous-Action Elevators and Conveyors, For Goods or Materials:Other, Bucket Type: data is updated monthly, averaging 0.315 MYR mn from Jan 2000 (Median) to Mar 2025, with 303 observations. The data reached an all-time high of 149.543 MYR mn in Nov 2005 and a record low of 0.000 MYR mn in Sep 2017. Malaysia Imports: Other Lifting, Handling, Loading or Unloading Machinery (For Example, Lifts, Continuous-Action Elevators and Conveyors, For Goods or Materials:Other, Bucket Type: data remains active status in CEIC and is reported by Department of Statistics. The data is categorized under Global Database’s Malaysia – Table MY.DOS: Imports: by Commodity: HS 6: 71 to 98: Value.
Over the last 81years, the CPR analysis team has analyzed more than a quarter of a million samples from over 6.5 million miles of tows in the North Sea, Norwegian Sea, North and South Atlantic, North Pacific, and Indian Oceans. In 2015 (to mid-December), approx. 124,600 nautical miles were sampled with over 4000 samples for analysis. Samples were taken in the North Atlantic and North Sea, Pacific, and Southern Ocean.
SAHFOS can supply some descriptive data at little cost (usually free).
Spatial and temporal data are stored for every sample analysed by the CPR survey, since 1946. This amounts to almost 170000 samples, with around 200 more samples added per month. The presence of every planktonic entity identified on each sample is stored in the database, and there are almost 2 million plankton records in total. The database also contains supportive information such as tow locations, times and dates, ship details, a taxon catalogue and analyst details.
Over 400 entities have been identified on CPR samples, and the 'abundance' of each entity on each sample can be extracted from the database. Some plankton are identified to species level, some to genus level, and some at a higher taxonomic level. Some entities are groups of other entities. The complete Species List is kept in the database.
Data can be extracted from user defined areas, over specified periods, for selected entities. For example, all samples taken from the Dogger Bank area in the North Sea during March, April, and May since 1946 could be extracted from the database, and the 'abundance' of selected diatom species on each sample could be listed. Alternatively, an average value, number of samples, and standard deviation per year per month, could be retrieved. The data can be exported to statistical and presentation packages in many popular formats such as text, rich text, comma separated, MS Excel, MS Access, MS Word, Fox Pro, Dbase, Lotus and to SQL compliant databases. SAHFOS can supply some descriptive data at little cost (usually free). Example of data
If you would like to know more about CPR coverage of a particular location, contact the Data Manager - Darren Stevens or David Johnsat SAHFOS.
For information about methods and parameters, and link to DATA page: http://www.sahfos.ac.uk/cpr-data/database.aspx
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Malaysia Imports: Other Lifting, Handling, Loading or Unloading Machinery (For Example, Lifts, Other Continuous-Action Elevators and Conveyors, For Goods or Materials:Other, Belt Type data was reported at 12.127 MYR mn in Mar 2025. This records a decrease from the previous number of 17.228 MYR mn for Feb 2025. Malaysia Imports: Other Lifting, Handling, Loading or Unloading Machinery (For Example, Lifts, Other Continuous-Action Elevators and Conveyors, For Goods or Materials:Other, Belt Type data is updated monthly, averaging 2.290 MYR mn from Jan 2000 (Median) to Mar 2025, with 303 observations. The data reached an all-time high of 71.146 MYR mn in Jan 2019 and a record low of 0.005 MYR mn in May 2006. Malaysia Imports: Other Lifting, Handling, Loading or Unloading Machinery (For Example, Lifts, Other Continuous-Action Elevators and Conveyors, For Goods or Materials:Other, Belt Type data remains active status in CEIC and is reported by Department of Statistics. The data is categorized under Global Database’s Malaysia – Table MY.DOS: Imports: by Commodity: HS 6: 71 to 98: Value.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used to assess the quality of the optical backscattering coefficient measurements taken during the BIOOPT2019 cruise in the Black Sea. The reference work by the same team, is "Single dual mode (Continuous and Cast) instrumentation package for IOP measurements: characterisation of the bucket for backscattering observation". The dataset consists of two parts, which were used to estimate the underway system inertia and the system uncertainty, respectively. The former aimed at assessing the b4bbo (bucket for backscattering observations) renewal water time and consists only of data collected underway (16 data files), while the latter is made of measurements acquired both continuously (56 data files) and during station (56 data files), profiling the water column.
Volume Scattering Function (VSF) data were acquired with a WetLab Inc. ECO-VSF3 sensor, from which the spectral backscattering coefficients can be derived, also included into the dataset. ECO-VSF3 samples the water at three wavelengths (470, 532, 660 nm) in three angles (111, 138, 154 degrees). This is presented as a comma-separated file also including time, location and the water flow through the system. File naming convention is: {instrument}-{context}-{two digits of progressive number}.csv. {context} can be as follows:
Each data file includes a two line header with variable names and units. The fields included into each data file are:
CTD measurements from Sea-Bird Scientific, Inc., MicroCAT SBE-37-SI were also include in cast data files:
Apart from the manufacturer calibration, no post processing was performed to any of these data. For each independent system inertia experiment, an automatic electro valve was activated twice. The first time, after 122 seconds from measurement acquisition start, to redirect the water through a 0.2 µm filter before entiring b4bbo for non-particulate bb measurement. The second, after 602 seconds from acquisition start, to let the water directly into b4bbo for total bb measurement.
This is a preprocessed dataset for a tutorial on binary classification problems. The data came from this dataset. I converted the data from csv format to numpy arrays so that it would be ready to use in a binary classification tutorial for beginners. Also, I selected just the 5 input features most correlated with the label out of 15 input features.
There are two files:
- X.npy
contains the input features. Described more below
- y.npy
contains the output labels. Whether the patient had a risk of coronary heart disease in the next 10 years or not.
The input features are in a 4238 x 5 numpy array. The 4238 rows correspond to 4238 training examples. And the 5 columns are the features: male
, age
, prevalentHyp
, sysBP
, diaBP
. In that order. Here is a description of each feature from the original dataset:
- male
: whether the patient is male (Boolean)
- age
: age of the patient (truncated to the nearest whole number) (Continuous)
- prevalentHyp
: whether the patient was hypertensive (had high blood pressure) (Boolean)
- sysBP
: systolic blood pressure (Continuous)
- diaBP
: diastolic blood pressure (Continuous)
All credit to the original version of the dataset, including the references described there.
This data will be used to explore binary prediction problems and logistic regression.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the last 90 years, the CPR Survey analysis team has analyzed more than a quarter of a million samples from over 7 million miles of tows in the North Sea, Norwegian Sea, North and South Atlantic, North Pacific, and Indian Oceans.
Spatial and temporal data are stored at the Marine Biological Association of the UK (MBA) for every sample analyzed by the CPR Survey, since 1946. This amounts to over 261,000 samples, with around 200 more samples added per month. The presence of every planktonic entity identified on each sample is stored in the database, and there are over 2 million plankton records in total. The database also contains supportive information such as tow locations, times and dates, ship details, a taxon catalog and analyst details.
Over 800 zooplankton and phytoplankton entities have been identified on CPR samples, and the 'abundance' of each entity on each sample can be extracted from the database. Some plankton are identified to species level, some to genus level, and some at a higher taxonomic level. Some entities are groups of other entities. The complete Species List is kept in the database.
Data can be extracted from user-defined areas, over specified periods, for selected entities [from the 'The CPR Survey' site]. For example, all samples taken from the Dogger Bank area in the North Sea during March, April, and May since 1946 could be extracted from the database, and the 'abundance' of selected diatom species on each sample could be listed. Alternatively, an average value, number of samples, and standard deviation per year per month could be retrieved. The data can be exported to statistical and presentation packages in many popular formats such as text, rich text, comma-separated, MS Excel, MS Access, MS Word, Fox Pro, Dbase, Lotus, and to SQL compliant databases. The CPR Survey can supply some descriptive data at little cost (usually free).
Future updates are planned to extend the time range of this dataset at BCO-DMO. If you would like to know more about CPR coverage of a particular location, contact David Johns at The CPR Survey.
For information about methods and parameters, and link to The CPR Survey data page: https://www.cprsurvey.org/data/our-data/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract–Multistate Markov models are a canonical parametric approach for data modeling of observed or latent stochastic processes supported on a finite state space. Continuous-time Markov processes describe data that are observed irregularly over time, as is often the case in longitudinal medical data, for example. Assuming that a continuous-time Markov process is time-homogeneous, a closed-form likelihood function can be derived from the Kolmogorov forward equations—a system of differential equations with a well-known matrix-exponential solution. Unfortunately, however, the forward equations do not admit an analytical solution for continuous-time, time- inhomogeneous Markov processes, and so researchers and practitioners often make the simplifying assumption that the process is piecewise time-homogeneous. In this article, we provide intuitions and illustrations of the potential biases for parameter estimation that may ensue in the more realistic scenario that the piecewise-homogeneous assumption is violated, and we advocate for a solution for likelihood computation in a truly time-inhomogeneous fashion. Particular focus is afforded to the context of multistate Markov models that allow for state label misclassifications, which applies more broadly to hidden Markov models (HMMs), and Bayesian computations bypass the necessity for computationally demanding numerical gradient approximations for obtaining maximum likelihood estimates (MLEs). Supplemental materials are available online.
Abstract copyright UK Data Service and data collection copyright owner.The Scottish Household Survey (SHS) is a continuous survey based on a sample of the general population in private residences in Scotland. It is financed by the Scottish Government (previously the Scottish Executive). The survey started in 1999 and up to 2011 followed a fairly consistent survey design. From 2012 onwards, the survey was substantially redesigned to include elements of the Scottish House Condition Survey (SHCS) (also available from the UK Data Service), including the physical survey. The SHS is run through a consortium led by Ipsos MORI. The survey is designed to provide reliable and up-to-date information on the composition, characteristics, attitudes and behaviour of private households and individuals, both nationally and at a sub-national level and to examine the physical condition of Scotland's homes. It covers a wide range of topics to allow links to be made between different policy areas.Further information about the survey series, and links to publications, can be found on the Scottish Government's Scottish Household Survey webpages.COVID-19 restrictionsDue to COVID-19 restrictions, the SHS was conducted by telephone or via MS Teams in 2020 and 2021 (SNs 9186 and 9187). Face-to-face interviewing resumed for SHS 2022 (SN 9294) when restrictions had been lifted. Scottish Household Survey Lite The SHS Lite dataset is a simplified version of the full Scottish Household Survey. To stimulate the use of SHS data, particularly amongst local authorities, voluntary organisations and academia, the Scottish Executive decided to commission a simplified data file, which would allow users to undertake most forms of analysis using a substantially smaller data file. The resulting dataset has had 1,700 variables removed. The full SHS dataset is both larger and more complex, containing around 30,000 cases for each two-year sweep of the survey and approximately 2,000 variables. The full 2003-2004 dataset is held at the UK Data Archive (UKDA) under SN 5020. The differences between the full SHS and the Lite dataset are that the number of variables has been reduced from 2,556 to 798, complex data loops have been removed and the original variables have been summarised in new variables. The variables have been organised into ‘sets’ of related variables. These sets can be used to further simplify accessing variables through SPSS dialog boxes. Some aspects of the data have not changed. For example, the number of cases remains over 30,000. With fewer variables however, running analysis will be faster on most computers. The structure of the data continues to include questions that relate to both sections (household and random adult) of the questionnaire. The data still need to be weighted before the results can be considered representative of the household or adult populations. The variable names are still linked to the Computer Assisted Personal Interviewing (CAPI) script used to collect the data. The questionnaire will remain an important reference source for identifying and understanding the variables in the data. The documentation for the SHS Lite dataset includes a Microsoft Access variable database, and an index of variable names. As it includes search functionality, this is available for download with the SHS Lite dataset, rather than via the 'Online Documentation' table below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of acceleration signals acquired from a low-cost Wireless Sensor Network (WSN) during seismic events occurred in Central Italy. The WSN consists of 5 low-cost sensor nodes, each embedding an ADXL355 tri-axial MEMS accelerometer with a fixed sampling frequency of 250 Hz. Continuous data was acquired from February 2023 to the end of June 2023. The continuous data was then trimmed around the origin time of seismic events that occurred near the installation site, close to the city of Pollenza (MC), Italy, during the acquisition period. A total of 67 events were selected from the Italian Istituto Nazionale di Geofisica e Vulcanologia (INGV) Seismology data center. The waveform data was then further analyzed and annotated by analysts from INGV. Annotations include a pick time for the S and P wave, and an uncertainty level for the annotations.
The data consists of two datasets, one containing earthquake traces, the other containing noise-only traces. There are two folders: the dataset_earthquakes folder contains seismic traces; the dataset_noise folder contains noise-only traces.
The earthquake dataset consists of 328 3x25001 arrays, each related to a seismic event and with its own metadata. The dataset follows the Seisbench format, in which each trace follows the convention 'bucket0$trace_number;:n_dimensions;:n_samples', where 'bucket0' indicates the block to which the trace belongs; 'trace_number' indicates the trace' index within the block; 'n_dimensions' denotes the number of measurement axes; and 'n_samples' represents the number of samples in the trace. The waveforms are included in the the waveforms.hdf5 file of the earthquake_dataset folder, while the metadata is in the metadata.csv file in the folder. For each trace in the waveforms.hdf5 file there is an associated row in the metadata.csv file at the same index (indicated by 'trace_number' in the trace name).
The original miniSEED files that were analyzed by the INGV analysts are made available. They are contained in the miniseed_files folder. Each file name follows the format '_eventID_originTime_WS.POZA.Sx.DNy.MSEED' where eventID is the ID of the event that is recorded in the trace, originTime is the origin of the event in UTC time (expressed with the YYYY-MM-DDThh:mm:ss.ssssss format), x is a number that is used to identify the sensor that recorded the trace, y indicates the measurement direction of that trace, named '1', '2', 'Z'. For each trace in the waveforms.hdf5 file, the name of the miniSEED files that comprise the trace are in the metadata row for that trace, under the ‘trace_name_original_1’, ‘trace_name_original_2’, and ‘trace_name_original_Z’ fields in the metadata.csv file.
The dataset_noise folder follows the same convention. It contains a waveforms.hdf5 file with waveforms without seismic activity. The metadata_csv file has the metadata associated to each noise trace. The miniSEED_files_noise folder contains the original miniSEED files of the noise traces.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset represents real-time data collected from strategically placed microphones and sensors in a music education platform. It contains information about students' performances during music lessons and the associated teaching effectiveness. The dataset aims to simulate the analysis of musical performances, including the accuracy of pitch, rhythm, and dynamics, as well as the evaluation of student engagement and teacher feedback.
Features: Timestamp:
Description: The exact date and time when the data was collected. Type: DateTime Example: 2024-12-20 10:00:00 Sensor ID:
Description: Unique identifier for each microphone or sensor monitoring the performance. Type: Categorical Example: Sensor_001, Sensor_002 Student ID:
Description: Identifier for the student whose performance is being monitored. Type: Categorical Example: Student_001, Student_002 Instrument Type:
Description: The type of musical instrument being played during the lesson. Type: Categorical Example: Piano, Guitar, Violin Pitch (Hz):
Description: The frequency (in Hertz) of the sound produced by the instrument. Type: Numerical (Continuous) Example: 440 Hz (A4 note) Rhythm (BPM):
Description: The tempo of the music being played, measured in beats per minute (BPM). Type: Numerical (Continuous) Example: 120 BPM Dynamics (dB):
Description: The loudness or intensity of the sound produced by the instrument, measured in decibels (dB). Type: Numerical (Continuous) Example: 75 dB Note Duration (s):
Description: The length of time for which each note is held during the performance. Type: Numerical (Continuous) Example: 0.5 seconds Pitch Accuracy (%):
Description: The accuracy with which the pitch produced matches the intended pitch, expressed as a percentage. Type: Numerical (Continuous) Example: 95% Rhythm Accuracy (%):
Description: The accuracy with which the rhythm (tempo and timing) matches the intended pattern, expressed as a percentage. Type: Numerical (Continuous) Example: 100% Teaching Effectiveness Rating:
Description: A rating given to evaluate the effectiveness of the teacher’s instruction based on student performance. Type: Categorical (Ordinal) Example: 5/5, 4/5 Lesson Type:
Description: The type of lesson or session being conducted (e.g., beginner, advanced, or practice). Type: Categorical Example: Beginner Lesson, Advanced Lesson, Practice Session Student Engagement Level:
Description: The level of student engagement during the lesson, measured as either High, Medium, or Low. Type: Categorical Example: High, Medium, Low Teacher Feedback:
Description: Feedback provided by the teacher based on the student's performance. Type: Categorical Example: Good rhythm, Needs improvement, Excellent performance Environmental Factors:
Description: The environmental conditions in which the lesson is conducted, which could influence the quality of the performance data (e.g., background noise). Type: Categorical Example: Quiet, Slight Background Noise, Noisy Student Progress (%):
Description: A measure of student progress over time, expressed as a percentage of improvement in skills. Type: Numerical (Continuous) Example: 85%, 60% Target (Performance Evaluation):
Description: A classification based on performance score. Students are classified as either High Performance or Low Performance based on their pitch and rhythm accuracy. Type: Categorical Example: High Performance, Low Performance Target Column Definition: Performance Score: The average of pitch and rhythm accuracy. If the performance score exceeds 90%, the student is classified as High Performance; otherwise, they are classified as Low Performance.