100+ datasets found

f
DataSheet1_Repeated Measures Correlation.pdf
frontiersin.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Z. Bakdash; Laura R. Marusich (2023). DataSheet1_Repeated Measures Correlation.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2017.00456.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2017.00456.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Jonathan Z. Bakdash; Laura R. Marusich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Repeated measures correlation (rmcorr) is a statistical technique for determining the common within-individual association for paired measures assessed on two or more occasions for multiple individuals. Simple regression/correlation is often applied to non-independent observations or aggregated data; this may produce biased, specious results due to violation of independence and/or differing patterns between-participants versus within-participants. Unlike simple regression/correlation, rmcorr does not violate the assumption of independence of observations. Also, rmcorr tends to have much greater statistical power because neither averaging nor aggregation is necessary for an intra-individual research question. Rmcorr estimates the common regression slope, the association shared among individuals. To make rmcorr accessible, we provide background information for its assumptions and equations, visualization, power, and tradeoffs with rmcorr compared to multilevel modeling. We introduce the R package (rmcorr) and demonstrate its use for inferential statistics and visualization with two example datasets. The examples are used to illustrate research questions at different levels of analysis, intra-individual, and inter-individual. Rmcorr is well-suited for research questions regarding the common linear association in paired repeated measures data. All results are fully reproducible.
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Statistical Analysis of Individual Participant Data Meta-Analyses: A...
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0046042
Dataset updated
Jun 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
u
SAPRIN Individual Demographic Dataset 2018 - South Africa
datafirst.uct.ac.za
Updated Jul 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Kobus Herbst (2020). SAPRIN Individual Demographic Dataset 2018 - South Africa [Dataset]. http://www.datafirst.uct.ac.za/Dataportal/index.php/catalog/study/zaf-saprin-sidd-2018-v1
Explore at:
Dataset updated
Jul 9, 2020
Dataset provided by
Prof Mark Collinson
Dr Kobus Herbst
Prof Marianne Alberts
Prof Steve Tollman
Prof Deenan Pillay
Time period covered
1993 - 2017
Area covered
South Africa
Description
Abstract

The South African Population Research Infrastructure Network (SAPRIN) is a national research infrastructure funded through the Department of Science and Technology and hosted by the South African Medical Research Council. One of SAPRIN’s initial goals has been to harmonise the legacy longitudinal data from the three current Health and Demographic Surveillance System (HDSS) Nodes. These long-standing nodes are the MRC/Wits University Agincourt HDSS in Bushbuckridge District, Mpumalanga, established in 1993, with a population of 116 000 people; the University of Limpopo DIMAMO HDSS in the Capricorn District of Limpopo, established in 1996, with a current population of 100 000; and the Africa Health Research Institute (AHRI) HDSS in uMkhanyakude District, KwaZulu-Natal, established in 2000, with a current population of 125 000.

SAPRIN data are processed for longitudinal analysis by organising the demographic data into residence episodes at a geographical location, and membership episodes within a household. Start events include enumeration, birth, in-migration and relocating into a household from within the study population; exit events include death (by cause), out-migration, and relocating to another location in the study population. Variables routinely updated at individual level include health care utilisation, marital status, labour status, education status, as well as recording household asset status. Anticipated outcomes of SAPRIN include: (i) regular releases of up-to-date, longitudinal data, representative of South Africa’s fast-changing poorer communities for research, interpretation and calibration of national datasets; (ii) national statistics triangulation, whereby longitudinal SAPRIN data are triangulated with National Census data for calibration of national statistics and studying the mechanisms driving the national statistics; (iii) An interdisciplinary research platform for conducting observational and interventional research at population level; (iv) policy engagement to provide evidence to underpin policy-making for cost evaluation and targeting intervention programmes, thereby improving the accuracy and efficiency of pro-poor, health and wellbeing interventions; (v) scientific education through training at related universities; and (vi) community engagement, whereby coordinated engagement with communities will enable two-way learning between researchers and community members, and enabling research site communities and service providers to have access to and make effective use of research results.

Geographic coverage

The Agincourt HDSS covers an area of approximately 420km2 and is located in Bushbuckridge District, Mpumalanga in the rural north-east of South Africa close to the Mozambique border. DIMAMO is located in the Capricorn district, Limpopo Province approximately 40 km from Polokwane, the capital city of Limpopo Province and 15-50 km from the University of Limpopo (Turfloop Campus). The site covers an area of approximately 200 km2. AHRI is situated in the south-east portion of the Umkhanyakude district of KwaZulu-Natal province near the town of Mtubatuba. It is bounded on the west by the Umfolozi-Hluhluwe nature reserve, on the south by the Umfolozi river, on the east by the N2 highway (except form portions where the KwaMsane township strandles the highway) and in the north by the Inyalazi river for portions of the boundary. The area is 438km2.

Analysis unit

Exposure episodes

Universe

Households resident in dwellings within the study area will be eligible for inclusion in the household component of SAPRIN. All individuals identified by the household proxy informant as a member of the household will be enumerated. A resident household member is an individual that intends to sleep the majority of time at the dwelling occupied by the household over a four-month period. Households will include resident and non-resident members. An individual is a non-resident member if they have close ties to the household, but do not physically reside with the household most of the time. They can also be called temporary migrants and they are enumerated within the household list. Because household membership is not tied to physical residency, an individual may be a member of more than one household.

Kind of data

Event/transaction data

Sampling procedure

This dataset is not based on a sample but contains information from the complete demographic surveillance areas.
The dataset contains PII
catalog.data.gov
gimi9.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). The dataset contains PII [Dataset]. https://catalog.data.gov/dataset/the-dataset-contains-pii
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These data are interview transcripts with individuals who are users of the Smoke Sense app. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: This data is available on request to approved individuals. Format: This data contains PII. These are interview transcripts. This dataset is associated with the following publication: Hano, M., L. Wei, B. Hubbell, and A. Rappold. Scaling Up: Citizen Science Engagement and Impacts Beyond the Individual. Citizen Science: Theory and Practice. Ubiquity Press, London, UK, 5(1): 1-13, (2020).
g
Ohio Vital Statistics Birth and Autism Data
gimi9.com
datasets.ai
+1more
Updated Sep 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Ohio Vital Statistics Birth and Autism Data [Dataset]. https://gimi9.com/dataset/data-gov_ohio-vital-statistics-birth-and-autism-data/
Explore at:
Dataset updated
Sep 2, 2024
Area covered
Ohio
Description
Input datasets on Ohio Birth and Autism will not be made accessible to the public due to the fact that they include individual-level data with PII. Output data are all available in tabulated form within the published manuscript. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Input data can be obtained from Applications from owners of the data (Children's Hospital and Ohio Department of Health). The tabulated output data is found in the manuscript. Format: Input datasets on Ohio Birth and Autism will not be made accessible to the public due to the fact that they include individual-level data with PII. Output data are all available in tabulated form within the published manuscript (e.g., results of regression models, measures of central tendency, population characteristics, etc.). This dataset is associated with the following publication: Kaufman, J., M. Wright, G. Rice, N. Connolly, K. Bowers, and J. Anixt. AMBIENT OZONE AND FINE PARTICULATE MATTER EXPOSURES AND AUTISM SPECTRUM DISORDER IN METROPOLITAN CINCINNATI, OHIO. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 171: 218-227, (2019).
COVID-19 Case Surveillance Public Use Data
catalog.data.gov
opendatalab.com
+6more
Updated Mar 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2022). COVID-19 Case Surveillance Public Use Data [Dataset]. https://catalog.data.gov/dataset/covid-19-case-surveillance-public-use-data
Explore at:
Dataset updated
Mar 3, 2022
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
Beginning March 1, 2022, the "COVID-19 Case Surveillance Public Use Data" will be updated on a monthly basis. This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data. CDC has three COVID-19 case surveillance datasets: COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements) COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements) COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (32 data elements) The following apply to all three datasets: Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. Some data cells are suppressed to protect individual privacy. The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured. Datasets are updated monthly. Datasets are created using CDC’s operational Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. For more information about data collection and reporting, please see https://wwwn.cdc.gov/nndss/data-collection.html For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html Overview The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported volun

Latest Data Professionals Salary Dataset

kaggle.com

Updated Jul 9, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Aman Chauhan (2023). Latest Data Professionals Salary Dataset [Dataset]. https://www.kaggle.com/datasets/whenamancodes/data-professionals-salary-dataset-2022

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 9, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Aman Chauhan

Description

About Dataset

Context

Analytics refers to the methodical examination and calculation of data or statistics. Its purpose is to uncover, interpret, and convey meaningful patterns found within the data. Additionally, analytics involves utilizing these data patterns to make informed decisions. It proves valuable in domains abundant with recorded information, employing a combination of statistics, computer programming, and operations research to measure performance.

Businesses can leverage analytics to describe, predict, and enhance their overall performance. Various branches of analytics encompass predictive analytics, prescriptive analytics, enterprise decision management, descriptive analytics, cognitive analytics, Big Data Analytics, retail analytics, supply chain analytics, store assortment and stock-keeping unit optimization, marketing optimization and marketing mix modeling, web analytics, call analytics, speech analytics, sales force sizing and optimization, price and promotion modeling, predictive science, graph analytics, credit risk analysis, and fraud analytics. Due to the extensive computational requirements involved (particularly with big data), analytics algorithms and software utilize state-of-the-art methods from computer science, statistics, and mathematics.

Data Dictionary

Columns	Description
Company Name	Company Name refers to the name of the organization or company where an individual is employed. It represents the specific entity that provides job opportunities and is associated with a particular industry or sector.
Job Title	Job Title refers to the official designation or position held by an individual within a company or organization. It represents the specific role or responsibilities assigned to the person in their professional capacity.
Salaries Reported	Salaries Reported indicates the information or data related to the salaries of employees within a company or industry. This data may be collected and reported through various sources, such as surveys, employee disclosures, or public records.
Location	Location refers to the specific geographical location or area where a company or job position is situated. It provides information about the physical location or address associated with the company's operations or the job's work environment.
Salary	Salary refers to the monetary compensation or remuneration received by an employee in exchange for their work or services. It represents the amount of money paid to an individual on a regular basis, typically in the form of wages or a fixed annual income.

Content

This Dataset consists of salaries for Data Scientists, Machine Learning Engineers, Data Analysts, and Data Engineers in various cities across India (2022).

-Salary Dataset.csv -Partially Cleaned Salary Dataset.csv

Acknowledgements

This Dataset is created from https://www.glassdoor.co.in/. If you want to learn more, you can visit the Website.

2018 Census Individual (part 3a) total New Zealand by Statistical Area 1
datafinder.stats.govt.nz
csv, dwg, geodatabase +6
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats NZ (2020). 2018 Census Individual (part 3a) total New Zealand by Statistical Area 1 [Dataset]. https://datafinder.stats.govt.nz/layer/104621-2018-census-individual-part-3a-total-new-zealand-by-statistical-area-1/
Explore at:
mapinfo tab, pdf, mapinfo mif, geodatabase, kml, shapefile, csv, geopackage / sqlite, dwgAvailable download formats
Dataset updated
May 18, 2020
Dataset provided by
Statistics New Zealandhttp://www.stats.govt.nz/
Authors
Stats NZ
License
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Area covered
New Zealand,
Description
This individual (part 3a) dataset is displayed by statistical area 1 geography and contains information on:

• Work and labour force status

• Status in employment

• Occupation – major group, by usual residence address

• Occupation – major group, by workplace address*

• Industry (division), by usual residence address

• Industry (division), by workplace address*

* Workplace address is coded from information supplied by respondents about their workplaces. Where respondents do not supply sufficient information, their responses are coded to ‘not further defined’. The statistical area 1 dataset for 2018 Census excludes these ‘not further defined’ areas.

This dataset contains counts at statistical area 1 for selected variables from the 2018, 2013, and 2006 censuses. The geography corresponds to 2018 boundaries.

The data uses fixed random rounding to protect confidentiality. Some counts of less than 6 are suppressed according to 2018 confidentiality rules. Values of ‘-999’ indicate suppressed data.

For further information on this dataset please refer to the Statistical area 1 dataset for 2018 Census webpage - footnotes for individual part 3a, Excel workbooks, and CSV files are available to download. Data quality ratings for 2018 Census variables, summarising the quality rating and priority levels for 2018 Census variables, are available.

For information on the statistical area 1 geography please refer to the Statistical standard for geographic areas 2018.
HIRENASD Experimental Data, Individual Plots
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+2more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). HIRENASD Experimental Data, Individual Plots [Dataset]. https://data.nasa.gov/dataset/hirenasd-experimental-data-individual-plots
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The HIRENASD data produced by analyzing the experimental data is repeated on this website, for those who can not download the information in the zip format found on the primary Experimental Data page, or who wish to examine the plots of the data online.
Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
Quarterly Labour Force Survey Household Dataset, April - June, 2021
beta.ukdataservice.ac.uk
Updated 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, April - June, 2021 [Dataset]. http://doi.org/10.5255/ukda-sn-8852-3
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8852-3
Dataset updated
2023
Dataset provided by
DataCitehttps://www.datacite.org/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Office For National Statistics
Description
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

Household datasets
Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

Change to coding of missing values for household series
From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS LFS User Guidance page before commencing analysis.

Additional data derived from the QLFS
The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

End User Licence and Secure Access QLFS Household datasets
Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

Changes to variables in QLFS Household EUL datasets
In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

Review of imputation methods for LFS Household data - changes to missing values
A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

Latest edition information
For the third edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN, SOC20M and SOC20O have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
u
International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3,...
data.ucar.edu
rda-web-prod.ucar.edu
+2more
hdf
Updated Aug 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Center for Ocean-Atmospheric Prediction Studies, Florida State University; Cooperative Institute for Research in Environmental Sciences, University of Colorado; Department of Atmospheric Science, University of Washington; Deutscher Wetterdienst (German Meteorological Service), Germany; Met Office, Ministry of Defence, United Kingdom; National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce; National Oceanography Centre, University of Southampton; Physical Sciences Laboratory, Earth System Research Laboratory, OAR, NOAA, U.S. Department of Commerce; Research Data Archive, Computational and Information Systems Laboratory, National Center for Atmospheric Research, University Corporation for Atmospheric Research (2025). International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3, Individual Observations [Dataset]. http://doi.org/10.5065/D6ZS2TR3
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.5065/D6ZS2TR3
Dataset updated
Aug 2, 2025
Dataset provided by
Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
Authors
Center for Ocean-Atmospheric Prediction Studies, Florida State University; Cooperative Institute for Research in Environmental Sciences, University of Colorado; Department of Atmospheric Science, University of Washington; Deutscher Wetterdienst (German Meteorological Service), Germany; Met Office, Ministry of Defence, United Kingdom; National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce; National Oceanography Centre, University of Southampton; Physical Sciences Laboratory, Earth System Research Laboratory, OAR, NOAA, U.S. Department of Commerce; Research Data Archive, Computational and Information Systems Laboratory, National Center for Atmospheric Research, University Corporation for Atmospheric Research
Time period covered
Oct 15, 1662 - Jun 30, 2025
Area covered
Description
The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) is a global ocean marine meteorological and surface ocean dataset. It is formed by merging many national and international data sources that contain measurements and visual observations from ships (merchant, navy, research), moored and drifting buoys, coastal stations, and other marine and near-surface ocean platforms. Each marine report contains individual observations of meteorological and oceanographic variables, such as sea surface and air temperatures, wind, pressure, humidity, and cloudiness. The coverage is global and sampling density varies depending on date and geographic position relative to shipping routes and ocean observing systems. The latest Release 3.0 (R3.0) of ICOADS covers 1662-2014, and is coupled with improved "preliminary" monthly data and product extensions past 2014. R3.0 includes changes designed to enable more effective exchange of information describing data quality between ICOADS, reanalysis centers, data set developers, scientists and the public. These user-driven innovations include the assignment of a unique identifier (UID) to each marine report, to enable tracing of observations, linking with reports and improved data sharing. Other revisions and extensions of the International Maritime Meteorological Archive (IMMA) common data format incorporate new near-surface oceanographic data elements and cloud parameters. Many new input data and metadata sources have been assembled, and updates and improvements to existing data sources, or removal of erroneous data, made.
u
Data from: MobileWell400+: A Large-Scale Multivariate Longitudinal Mobile...
produccioncientifica.ucm.es
zenodo.org
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Banos, Oresti; Damas, Miguel; Goicoechea, Carmen; Perakakis, Pandelis; Pomares, Hector; Rodriguez-Leon, Ciro; Sanabria, Daniel; Villalonga, Claudia; Banos, Oresti; Damas, Miguel; Goicoechea, Carmen; Perakakis, Pandelis; Pomares, Hector; Rodriguez-Leon, Ciro; Sanabria, Daniel; Villalonga, Claudia (2024). MobileWell400+: A Large-Scale Multivariate Longitudinal Mobile Dataset for Investigating Individual and Collective Well-Being [Dataset]. https://produccioncientifica.ucm.es/documentos/668fc499b9e7c03b01be2372
Explore at:
Dataset updated
2024
Authors
Banos, Oresti; Damas, Miguel; Goicoechea, Carmen; Perakakis, Pandelis; Pomares, Hector; Rodriguez-Leon, Ciro; Sanabria, Daniel; Villalonga, Claudia; Banos, Oresti; Damas, Miguel; Goicoechea, Carmen; Perakakis, Pandelis; Pomares, Hector; Rodriguez-Leon, Ciro; Sanabria, Daniel; Villalonga, Claudia
Description
This study engaged 409 participants over a period spanning from July 10 to August 8, 2023, ensuring representation across various demographic factors: 221 females, 186 males, 2 non-binary, year of birth between 1951 and 2005, with varied annual incomes and from 15 Spanish regions. The MobileWell400+ dataset, openly accessible, encompasses a wide array of data collected via the participants' mobile phone, including demographic, emotional, social, behavioral, and well-being data. Methodologically, the project presents a promising avenue for uncovering new social, behavioral, and emotional indicators, supplementing existing literature. Notably, artificial intelligence is considered to be instrumental in analysing these data, discerning patterns, and forecasting trends, thereby advancing our comprehension of individual and population well-being. Ethical standards were upheld, with participants providing informed consent.

The following is a non-exhaustive list of collected data:

Data continuously collected through the participants' smartphone sensors: physical activity (resting, walking, driving, cycling, etc.), name of detected WiFi networks, connectivity type (WiFi, mobile, none), ambient light, ambient noise, and status of the device screen (on, off, locked, unlocked).

Data corresponding to an initial survey prompted via the smartphone, with information related to demographic data, effects and COVID vaccination, average hours of physical activity, and answers to a series of questions to measure mental health, many of them taken from internationally recognised psychological and well-being scales (PANAS, PHQ, GAD, BRS and AAQ), social isolation (TILS) and economic inequality perception.

Data corresponding to daily surveys prompted via the smartphone, where variables related to mood (valence, activation, energy and emotional events) and social interaction (quantity and quality) are measured.

Data corresponding to weekly surveys prompted via the smartphone, where information on overall health, hours of physical activity per week, lonileness, and questions related to well-being are asked.

Data corresponding to an final survey prompted via the smartphone, consisting of similar questions to the ones asked in the initial survey, namely psychological and well-being items (PANAS, PHQ, GAD, BRS and AAQ), social isolation (TILS) and economic inequality perception questions.

For a more detailed description of the study please refer to MobileWell400+StudyDescription.pdf.

For a more detailed description of the collected data, variables and data files please refer to MobileWell400+FilesDescription.pdf.
f
Individual difference measures: Descriptive statistics and variable...
datasetcatalog.nlm.nih.gov
plos.figshare.com
+1more
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deist, Melanie; Fourie, Melike M. (2023). Individual difference measures: Descriptive statistics and variable intercorrelations. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001091396
Explore at:
Dataset updated
Apr 6, 2023
Authors
Deist, Melanie; Fourie, Melike M.
Description
Individual difference measures: Descriptive statistics and variable intercorrelations.
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
nada-demo.ihsn.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World, World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
NBA Rookies Performance Statistics and Minutes
kaggle.com
Updated Jan 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). NBA Rookies Performance Statistics and Minutes [Dataset]. https://www.kaggle.com/datasets/thedevastator/nba-rookies-performance-statistics-and-minutes-p/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
NBA Rookies Performance Statistics and Minutes Played: 1980-2016

Tracking Basketball Prodigies' Growth and Achievements

By Gabe Salzer [source]

About this dataset

This dataset contains essential performance statistics for NBA rookies from 1980-2016. Here you can find minute per game stats, points scored, field goals made and attempted, three-pointers made and attempted, free throws made and attempted (with the respective percentages for each), offensive rebounds, defensive rebounds, assists, steals blocks turnovers efficiency rating and Hall of Fame induction year. It is organized in descending order by minutes played per game as well as draft year. This Kaggle dataset is an excellent resource for basketball analysts to gain a better understanding of how rookies have evolved over the years—from their stats to how they were inducted into the Hall of Fame. With its great detail on individual players' performance data this dataset allows you to compare their performances against different eras in NBA history along with overall trends in rookie statistics. Compare rookies drafted far apart or those that played together- whatever your goal may be!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is perfect for providing insight into the performance of NBA rookies over an extended period of time. The data covers rookie stats from 1980 to 2016 and includes statistics such as points scored, field goals made, free throw percentage, offensive rebounds, defensive rebounds and assists. It also provides the name of each rookie along with the year they were drafted and their Hall of Fame class.

This data set is useful for researching how rookies’ stats have changed over time in order to compare different eras or identify trends in player performance. It can also be used to evaluate players by comparing their stats against those of other players or previous years’ stats.

In order to use this dataset effectively, a few tips are helpful:

Consider using Field Goal Percentage (FG%), Three Point Percentage (3P%) and Free Throw Percentage (FT%) to measure a player’s efficiency beyond just points scored or field goals made/attempted (FGM/FGA).

Lookout for anomalies such as low efficiency ratings despite high minutes played as this could indicate that either a player has not had enough playing time in order for their statistics to reach what would be per game average when playing more minutes or that they simply did not play well over that short period with limited opportunities.

Try different visualizations with the data such as histograms, line graphs and scatter plots because each may offer different insights into varied aspects of the data set like comparison between individual years vs aggregate trends over multiple years etc.

Lastly it is important keep in mind whether you're dealing with cumulative totals over multiple seasons versus looking at individual season averages or per game numbers when attempting analysis on these sets!

Research Ideas

Evaluating the performance of historical NBA rookies over time and how this can help inform future draft picks in the NBA.

Analysing the relative importance of certain performance stats, such as three-point percentage, to overall success and Hall of Fame induction from 1980-2016.

Comparing rookie seasons across different years to identify common trends in terms of statistical contributions and development over time

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: NBA Rookies by Year_Hall of Fame Class.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | Name | The name of...
C
Hospital Annual Financial Data - Selected Data & Pivot Tables
data.chhs.ca.gov
data.ca.gov
+6more
csv, data, doc, html +4
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
Explore at:
xlsx, xlsx(14714368), xls(51424256), xlsx(771275), xlsx(769128), xlsx(754073), pdf(333268), xls(920576), pdf(121968), xls(44933632), xlsx(777616), html, xlsx(768036), pdf(310420), doc, xls(16002048), xls(19625472), xls(14657536), pdf(303198), xlsx(770931), xls(19650048), zip, xlsx(758089), xlsx(752914), xls(18445312), xls(51554816), xlsx(756356), xlsx(763636), xlsx(782546), xls, xlsx(758376), pdf(383996), pdf(258239), xls(19599360), data, xlsx(750199), csv(205488092), xlsx(765216), xls(44967936), xls(19577856), xls(18301440), xlsx(779866), xlsx(790979)Available download formats
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
f
Dataset for: Analyzing discrete competing risks data with partially...
wiley.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minjung Lee; Eric J Feuer; Zhuoqiao Wang; Hyunsoon Cho; Joe Zou; Benjamin Hankey; Angela B. Mariotto; Jason P Fine (2023). Dataset for: Analyzing discrete competing risks data with partially overlapping or independent data sources and non-standard sampling schemes, with application to cancer registries [Dataset]. http://doi.org/10.6084/m9.figshare.9795572.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9795572.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Minjung Lee; Eric J Feuer; Zhuoqiao Wang; Hyunsoon Cho; Joe Zou; Benjamin Hankey; Angela B. Mariotto; Jason P Fine
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This paper demonstrates the flexibility of a general approach for the analysis of discrete time competing risks data that can accommodate complex data structures, different time scales for different causes, and nonstandard sampling schemes. The data may involve a single data source where all individuals contribute to analyses of both cause-specific hazard functions, overlapping datasets where some individuals contribute to the analysis of the cause-specific hazard function of only one cause while other individuals contribute to analyses of both cause-specific hazard functions, or separate data sources where each individual contributes to the analysis of the cause-specific hazard function of only a single cause. The approach is modularized into estimation and prediction. For the estimation step, the parameters and the variance-covariance matrix can be estimated using widely available software. The prediction step utilizes a generic program with plug-in estimates from the estimation step. The approach is illustrated with three prognostic models for stage IV male oral cancer using different data structures. The first model uses only men with stage IV oral cancer from population-based registry data. The second model strategically extends the cohort to improve the efficiency of the estimates. The third model improves the accuracy for those with a lower risk of other causes of death, by bringing in an independent data source collected under a complex sampling design with additional other-cause covariates. These analyses represent novel extensions of existing methodology, broadly applicable for the development of prognostic models capturing both the cancer and non-cancer aspects of a patient's health.
e
Data from: World Mineral Statistics Dataset
data.europa.eu
html
Updated Oct 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bath and North East Somerset Council (2021). World Mineral Statistics Dataset [Dataset]. https://data.europa.eu/set/data/world-mineral-statistics-dataset1
Explore at:
htmlAvailable download formats
Dataset updated
Oct 11, 2021
Dataset authored and provided by
Bath and North East Somerset Council
Description
The Bath and North East Somerset Council has one of the largest databases in the world on the production and trade of minerals. The dataset contains annual production statistics by mass for more than 70 mineral commodities covering the majority of economically important and internationally-traded minerals, metals and mineral-based materials. For each commodity the annual production statistics are recorded for individual countries, grouped by continent. Import and export statistics are also available for years up to 2002. Maintenance of the database is funded by the Science Budget and output is used by government, private industry and others in support of policy, economic analysis and commercial strategy. As far as possible the production data are compiled from primary, official sources. Quality assurance is maintained by participation in such groups as the International Consultative Group on Non-ferrous Metal Statistics. Individual commodity and country tables are available for sale on request.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jonathan Z. Bakdash; Laura R. Marusich (2023). DataSheet1_Repeated Measures Correlation.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2017.00456.s001

DataSheet1_Repeated Measures Correlation.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/fpsyg.2017.00456.s001

Dataset updated

May 30, 2023

Dataset provided by

Frontiers

Authors

Jonathan Z. Bakdash; Laura R. Marusich

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Repeated measures correlation (rmcorr) is a statistical technique for determining the common within-individual association for paired measures assessed on two or more occasions for multiple individuals. Simple regression/correlation is often applied to non-independent observations or aggregated data; this may produce biased, specious results due to violation of independence and/or differing patterns between-participants versus within-participants. Unlike simple regression/correlation, rmcorr does not violate the assumption of independence of observations. Also, rmcorr tends to have much greater statistical power because neither averaging nor aggregation is necessary for an intra-individual research question. Rmcorr estimates the common regression slope, the association shared among individuals. To make rmcorr accessible, we provide background information for its assumptions and equations, visualization, power, and tradeoffs with rmcorr compared to multilevel modeling. We introduce the R package (rmcorr) and demonstrate its use for inferential statistics and visualization with two example datasets. The examples are used to illustrate research questions at different levels of analysis, intra-individual, and inter-individual. Rmcorr is well-suited for research questions regarding the common linear association in paired repeated measures data. All results are fully reproducible.

Clear search

Close search

Google apps

Main menu

DataSheet1_Repeated Measures Correlation.pdf

Simulation Data Set

Statistical Analysis of Individual Participant Data Meta-Analyses: A...

SAPRIN Individual Demographic Dataset 2018 - South Africa

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

The dataset contains PII

Ohio Vital Statistics Birth and Autism Data

COVID-19 Case Surveillance Public Use Data

Latest Data Professionals Salary Dataset

About Dataset

Context

Data Dictionary

Content

Acknowledgements

2018 Census Individual (part 3a) total New Zealand by Statistical Area 1

HIRENASD Experimental Data, Individual Plots

Website Statistics

Quarterly Labour Force Survey Household Dataset, April - June, 2021

International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3,...

Data from: MobileWell400+: A Large-Scale Multivariate Longitudinal Mobile...

Individual difference measures: Descriptive statistics and variable...

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

NBA Rookies Performance Statistics and Minutes

NBA Rookies Performance Statistics and Minutes Played: 1980-2016

Tracking Basketball Prodigies' Growth and Achievements

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Hospital Annual Financial Data - Selected Data & Pivot Tables

Dataset for: Analyzing discrete competing risks data with partially...

Data from: World Mineral Statistics Dataset

DataSheet1_Repeated Measures Correlation.pdf