Facebook
TwitterBy UCI [source]
This dataset provides an intimate look into student performance and engagement. It grants researchers access to numerous salient metrics of academic performance which illuminate a broad spectrum of student behaviors: how students interact with online learning material; quantitative indicators reflecting their academic outcomes; as well as demographic data such as age group, gender, prior education level among others.
The main objective of this dataset is to enable analysts and educators alike with empirical insights underpinning individualized learning experiences - specifically in identifying cases when students may be 'at risk'. Given that preventive early interventions have been shown to significantly mitigate chances of course or program withdrawal among struggling students - having accurate predictive measures such as this can greatly steer pedagogical strategies towards being more success oriented.
One unique feature about this dataset is its intricate detailing. Not only does it provide overarching summaries on a per-student basis for each presented courses but it also furnishes data related to assessments (scores & submission dates) along with information on individuals' interactions within VLEs (virtual learning environments) - spanning different types like forums, content pages etc... Such comprehensive collation across multiple contextual layers helps paint an encompassing portrayal of student experience that can guide better instructional design.
Due credit must be given when utilizing this database for research purposes through citation. Specifically referencing (Kuzilek et al., 2015) OU Analyse: Analysing At-Risk Students at The Open University published in Learning Analytics Review is required due to its seminal work related groundings regarding analysis methodologies stem from there.
Immaterial aspects aside - it is important to note that protection of student privacy is paramount within this dataset's terms and conditions. Stringent anonymization techniques have been implemented across sensitive variables - while detailed, profiles can't be traced back to original respondents.
How To Use This Dataset:
Understanding Your Objectives: Ideal objectives for using this dataset could be to identify at-risk students before they drop out of a class or program, improving course design by analyzing how assignments contribute to final grades, or simply examining relationships between different variables and student performance.
Set up your Analytical Environment: Before starting any analysis make sure you have an analytical environment set up where you can load the CSV files included in this dataset. You can use Python notebooks (Jupyter), R Studio or Tableau based software in case you want visual representation as well.
Explore Data Individually: There are seven separate datasets available: Assessments; Courses; Student Assessment; Student Info; Vle (Virtual Learning Environment); Student Registeration and Student Vle. Load these CSVs separately into your environment and do an initial exploration of each one: find out what kind of data they contain (numerical/categorical), if they have missing values etc.
Merge Datasets As the core idea is to track a student’s journey through multiple courses over time, combining these datasets will provide insights from wider perspectives. One way could be merging them using common key columns such as 'code_module', 'code_presentation', & 'id_student'. But make sure that merge should depend on what question you're trying to answer.
Identify Key Metrics Your key metrics will depend on your objectives but might include: overall grade averages per course or assessment type/student/region/gender/age group etc., number of clicks in virtual learning environment, student registration status etc.
Run Your Analysis Now you can run queries to analyze the data relevant to your objectives. Try questions like: What factors most strongly predict whether a student will fail an assessment? or How does course difficulty or the number of allotments per week change students' scores?
Visualization: Visualizing your data can be crucial for understanding patterns and relationships between variables. Use graphs like bar plots, heatmaps, and histograms to represent different aspects of your analyses.
Actionable Insights: The final step is interpreting these results in ways that are meaningf...
Facebook
TwitterTRACE-P_Merge_Data is the merge data files created from data collected during during the Transport and Chemical Evolution over the Pacific (TRACE-P) suborbital campaign. Data collection for this product is complete.The NASA TRACE-P mission was a part of NASA’s Global Tropospheric Experiment (GTE) – an assemblage of missions conducted from 1983-2001 with various research goals and objectives. TRACE-P was a multi-organizational campaign with NASA, the National Center for Atmospheric Research (NCAR), and several US universities. TRACE-P deployed its payloads in the Pacific between the months of March and April 2001 with the goal of studying the air chemistry emerging from Asia to the western Pacific. Along with this, TRACE-P had the objective studying the chemical evolution of the air as it moved away from Asia. In order to accomplish its goals, the NASA DC-8 aircraft and NASA P-3B aircraft were deployed, each equipped with various instrumentation. TRACE-P also relied on ground sites, and satellites to collect data. The DC-8 aircraft was equipped with 19 instruments in total while the P-3B boasted 21 total instruments. Some instruments on the DC-8 include the Nephelometer, the GCMS, the Nitric Oxide Chemiluminescence, the Differential Absorption Lidar (DIAL), and the Dual Channel Collectors and Fluorometers, HPLC. The Nephelometer was utilized to gather data on various wavelengths including aerosol scattering (450, 550, 700nm), aerosol absorption (565nm), equivalent BC mass, and air density ratio. The GCMS was responsible for capturing a multitude of compounds in the atmosphere, some of which include CH4, CH3CHO, CH3Br, CH3Cl, CHBr3, and C2H6O. DIAL was used for a variety of measurements, some of which include aerosol wavelength dependence (1064/587nm), IR aerosol scattering ratio (1064nm), tropopause heights and ozone columns, visible aerosol scattering ratio, composite tropospheric ozone cross-sections, and visible aerosol depolarization. Finally, the Dual Channel Collectors and Fluorometers, HPLC collected data on H2O2, CH3OOH, and CH2O in the atmosphere. The P-3B aircraft was equipped with various instruments for TRACE-P, some of which include the MSA/CIMS, the Non-dispersive IR Spectrometer, the PILS-Ion Chromatograph, and the Condensation particle counter and Pulse Height Analysis (PHA). The MSA/CIMS measured OH, H2SO4, MSA, and HNO3. The Non-dispersive IR Spectrometer took measurements on CO2 in the atmosphere. The PILS-Ion Chromatograph recorded measurements of compounds and elements in the atmosphere, including sodium, calcium, potassium, magnesium, chloride, NH4, NO3, and SO4. Finally, the Condensation particle counter and PHA was used to gather data on total UCN, UCN 3-8nm, and UCN 3-4nm. Along with the aircrafts, ground stations measured air quality from China along with C2H2, C2H6, CO, and HCN. Finally, satellites imagery was used to collect a multitude of data, some of the uses were to observe the history of lightning flashes, SeaWiFS cloud imagery, 8-day exposure to TOMS aerosols, and SeaWiFS aerosol optical thickness. The imagery was used to best aid in planning for the aircraft deployment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset is the repository for the following paper submitted to Data in Brief:
Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).
The Data in Brief article contains the supplement information and is the related data paper to:
Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).
Description/abstract
The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.
Folder structure
The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:
“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.
“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.
“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).
“yield_productivity” contains .csv files of yield information for all countries listed above.
“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).
“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.
“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.
Code structure
1_MODIS_NDVI_hdf_file_extraction.R
This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.
2_MERGE_MODIS_tiles.R
In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").
3_CROP_MODIS_merged_tiles.R
Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.
4_TREND_analysis_NDVI.R
Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.
5_BUILT_UP_change_raster.R
Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.
6_POPULATION_numbers_plot.R
For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.
7_YIELD_plot.R
In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.
8_GLDAS_read_extract_trend
The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Combine population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of Combine. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.
Key observations
The largest age group was 18 - 64 years with a poulation of 1,672 (61.45% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age cohorts:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Combine Population by Age. You can refer the same here
Facebook
Twitteranalyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Project Description:
Title: Pandas Data Manipulation and File Conversion
Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.
Key Objectives:
Tools and Libraries Used:
Project Implementation:
DataFrame Creation:
Data Manipulation:
File Conversion:
to_excel() function.to_csv() function.Expected Outcome:
Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.
Conclusion:
The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overall, this project was meant test the relationship between social media posts and their short-term effect on stock prices. We decided to use Reddit posts from financial specific subreddit communities like r/wallstreetbets, r/investing, and r/stocks to see the changes in the market associated with a variety of posts made by users. This idea came to light because of the GameStop short squeeze that showed the power of social media in the market. Typically, stock prices should purely represent the total present value of all the future value of the company, but the question we are asking is whether social media can impact that intrinsic value. Our research question was known from the start and it was do Reddit posts for or against a certain stock provide insight into how the market will move in a short window. To solve this problem, we selected five large tech companies including Apple, Tesla, Amazon, Microsoft, and Google. These companies would likely give us more data in the subreddits and would have less volatility day to day allowing us to simulate an experiment easier. They trade at very high values so a change from a Reddit post would have to be significant giving us proof that there is an effect.
Next, we had to choose our data sources for to have data to test with. First, we tried to locate the Reddit data using a Reddit API, but due to circumstances regarding Reddit requiring approval to use their data we switched to a Kaggle dataset that contained metadata from Reddit. For our second data set we had planned to use Yahoo Finance through yfinance, but due to the large amount of data we were pulling from this public API our IP address was temporarily blocked. This caused us to switch our second data to pull from Alpha Vantage. While this was a large switch in the public it was a minor roadblock and fixing the Finance pulling section allowed for everything else to continue to work in succession. Once we had both of our datasets programmatically pulled into our local vs code, we implemented a pipeline to clean, merge, and analyze all the data. At the end, we implement a Snakemake workflow to ensure the project was easily reproducible. To continue, we utilized Textblob to label our Reddit posts with a sentiment value of positive, negative, or neutral and provide us with a correlation value to analyze with. We then matched the time frame of each post with the stock data and computed any possible changes, found a correlation coefficient, and graphed our findings.
To conclude the data analysis, we found that there is relatively small or no correlation between the total companies, but Microsoft and Google do show stronger correlations when analyzed on their own. However, this may be due to other circumstances like why the post was made or if the market had other trends on those dates already. A larger analysis with more data from other social media platforms would be needed to conclude for our hypothesis that there is a strong correlation.
Facebook
TwitterAs part of this project, we produced a new dataset, which harmonizes numerous existing public opinion surveys from across the world to create a unique global public opinion dataset. These studies consists of over 1,100 individual country-year datasets. Putting all these together, covers 160 countries and over 3 Million respondents.
This research will study the legacy impacts of previous authoritarian regimes on its citizens' political attitudes today. It thereby addresses important and unresolved questions of democratisation, by using a new methodological approach of cohort analysis to examine the lasting legacy of authoritarian dictatorships. Previous research has overlooked the possibility of citizens' formative experiences in non-democratic systems that might impact their political attitudes, values, and behaviour even after the existence of these regimes. We expect that these legacy impacts have important implications for the development of a democratic political culture in transitioning societies.
We will hence develop a new theory of authoritarian socialization, which assumes that different authoritarian regimes vary in the way they suppress their citizens, and that this in turn will lead to distinctive beliefs and behaviour in the population. Studying the experience of whole generations (or cohorts as they are also referred to) who have been socialised under dictatorships makes it possible to investigate whether regimes differ in terms of the impact they may have on their citizens' beliefs. Further we are interested in whether and how this imprint might negatively affect the establishment of a democratic political culture. The objective of this project is to develop a typology of regime characteristics and their lasting impact on the population. We expect that this typology and an accompanying policy brief will inform the practical developmental work of organisations working in transitioning societies.
This objective will be achieved by conducting a comprehensive analysis of post-authoritarian countries from different parts of the world during the entire 20th century that experienced different types and durations of suppression. This includes the military regimes in South America, but also the socialist regimes in the former Eastern block. It is not possible to study the impact of these regimes during their existence, as representative public opinion research is not possible during dictatorships. We argue, however, that this is not necessary. Instead we rely on the method of cohort analysis, developed by the principle investigator Dr. Neundorf. One of the main methodological innovations of this project is that this method allows us to identify distinct characteristics of those generations that were mainly socialised during dictatorships.
To test our new theory of authoritarian socialisation, we will merge existing survey data from numerous post-authoritarian countries. Today this is possible, as survey research and public opinion polls are widespread beyond established Western democracies. For example, since 1995 several Latin American countries annually take part in the Latinobarometro. Other data that will be used include the World Value Survey (1980-2012), and Asiabarometer (2001-2012) as well as all six rounds of the ESRC-funded European Social Survey (2002-2012). The different survey questions included in the diverse datasets will be harmonised so that a joint analysis is possible. This is a major task of this project and will yield a unique longitudinal, global database of individuals' political attitudes and behaviour.
In order to assign the regime characteristics under which each generation grew up, we will further merge existing data sources (e.g. Polity IV and Autocratic Regime Transitions data) on authoritarian regimes to measure the distinct features of each regime. We will focus, on factors such as intra-elite structure, extent, scope and density of repression, and transition to democracy. The two datasets of individual-level survey data and regime characteristics will be jointly analysed using quantitative statistical analysis of hierarchical age, period, cohort analysis to estimate the generational differences in democratic attitudes and behaviour.
Facebook
TwitterPrecipitation from five kinds of satellite estimates and the NCEP/NCAR reanalysis model output are combined for global coverage monthly precipitation values. The standard products merge only the 5 kinds of satellite estimates (GPI, OPI, SSM/I scattering, SSM/I emission and MSU). The enhanced products merge the satellite estimates with the blended NCEP/NCAR Reanalysis precipitation. Both products are available as monthly totals, pentads and long term monthly means on a global 2.5 degree latitude-longitude grid. All data is in the netCDF format.
Facebook
TwitterWNA-FLEXPART-BackTraj-1994-2021-Merge is the combined 1994-2021 Western North America Back Trajectory data using the FLEXible PARTicle (FLEXPART) dispersion model. Data collection for this product is complete.Backward simulations of airmass transport using a Lagrangian Particle Dispersion Model (LPDM) framework can establish source-receptor relationships (SRRs), supporting analysis of source contributions from various geospatial regions and atmospheric layers to downwind observations. In this study, we selected receptor locations to match gridded ozone observations over Western North America (WNA) from ozonesonde, lidar, commercial aircraft sampling, and aircraft campaigns (1994-2021). For each receptor, we used the FLEXible PARTicle (FLEXPART) dispersion model, driven by ERA5 reanalysis data, to achieve 15-day backwards SRR calculations, providing global simulations at high temporal (hourly) and spatial (1° x 1°) resolution, from the surface up to 20 km above ground level. This product retains detailed information for each receptor, including the gridded ozone value product, allowing the user to illustrate and identify source contributions to various subsets of ozone observations in the troposphere above WNA over nearly 3 decades at different vertical layers and temporal scales, such as diurnal, daily, seasonal, intra-annual, and decadal. This model product can also support source contribution analyses for other atmospheric components observed over WNA, if other co-located observations have been made at the spatial and temporal scales defined for some or all of the gridded ozone receptors used here. These data products were generated with support from NASA’s Atmospheric Composition Campaign Data Analysis and Modeling program and the NASA High-End Computing (HEC) Program through the NASA Advanced Supercomputing (NAS) Division at Ames Research Center (award SMD-20-28429430). The critical role of in situ measurements collected by many individual research teams, as well as contributions in model development and intermediate analyses, is gratefully acknowledged.For further information, please contact the authors of the manuscripts describing the datasets. Information is included in the ReadMe files, which are provided in the Related URLs.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Facebook
TwitterAbstract from accompanying documentation article: This article documents Open access article processing charges (OA APC) Main 2016 available for download from the OA APC dataverse, an update and expansion of the preliminary 2015 dataset described in Data [1]. This dataset was gathered as part of Sustaining the Knowledge Commons (SKC), a research program funded by Canada’s Social Sciences and Humanities Research Council. The overall goal of SKC is to advance our collective knowledge about how to transition scholarly publishing from a system dependent on subscriptions and purchase to one that is fully open access. The OA APC Main 2016 dataset was developed as one of the lines of research of SKC, a longitudinal study of the minority (about a third) of the fully open access journals that use this business model. Data gathering and analyses will continue on an ongoing basis and will be published annually. We encourage others to share their data as well. In order to merge datasets, note that the two most critical elements for matching data and merging datasets are the journal title and ISSN.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
Facebook
TwitterCAMP2Ex_Merge_Data are pre-generated aircraft merge data files created utilizing data collected during the Clouds, Aerosol and Monsoon Processes-Philippines Experiment (CAMP2Ex) NASA field study. Data collection for this product is complete. CAMP2Ex was a NASA field study, with three main science objectives: aerosol effect on cloud microphysical and optical properties, aerosol and cloud influence on radiation as well as radiative feedback, and meteorology effect on aerosol distribution and aerosol-cloud interactions. Research on these three main objectives requires a comprehensive characterization of aerosol, cloud, and precipitation properties, as well as the associated meteorological and radiative parameters. Trace gas tracers are also needed for airmass type analysis to characterize the role of anthropogenic and natural aerosols. To deliver these observations, CAMP2Ex utilized a combination of remote sensing and in-situ measurements. NASA’s P-3B aircraft was equipped with a suite of in-situ instruments to conduct measurements of aerosol and cloud properties, trace gases, meteorological parameters, and radiative fluxes. The P-3B was also equipped passive remote sensors (i.e. lidar, polarimeter, radar, and radiometers). A second aircraft, the SPEC Learjet 35A, was primarily dedicated to measuring detailed cloud microphysical properties. The sampling strategy designed for CAMP2Ex coordinated flight plans for both aircraft to maximize the science return. The P-3B was used primarily to conduct remote sensing measurements of cloud and precipitation structure and aerosol layers and vertical profiles of atmospheric state variable, while the Learjet flew below the P-3B to obtain the detailed cloud microphysical properties. During the 2019 field deployment in the vicinity of the Philippines, completed from August 20-October 10, the P-3B conducted 19 science flights and the SPEC Learjet conducted 11 flights. Ground-based aerosol observations were also recorded in 2018 and 2019. CAMP2Ex was completed in partnership with Philippine research and operational weather communities. Measurements completed during CAMP2EX provide a 4-D observational view of the environment of the Philippines and its neighboring waters in terms of microphysical, hydrological, dynamical, thermodynamical and radiative properties of the environment, targeting the environment of shallow cumulus and cumulus congestus clouds.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset illustrates the median household income in Combine, spanning the years from 2010 to 2023, with all figures adjusted to 2023 inflation-adjusted dollars. Based on the latest 2019-2023 5-Year Estimates from the American Community Survey, it displays how income varied over the last decade. The dataset can be utilized to gain insights into median household income trends and explore income variations.
Key observations:
From 2010 to 2023, the median household income for Combine decreased by $6,989 (7.03%), as per the American Community Survey estimates. In comparison, median household income for the United States increased by $5,602 (7.68%) between 2010 and 2023.
Analyzing the trend in median household income between the years 2010 and 2023, spanning 13 annual cycles, we observed that median household income, when adjusted for 2023 inflation using the Consumer Price Index retroactive series (R-CPI-U-RS), experienced growth year by year for 7 years and declined for 6 years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2022-inflation-adjusted dollars.
Years for which data is available:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Combine median household income. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the the household distribution across 16 income brackets among four distinct age groups in Combine: Under 25 years, 25-44 years, 45-64 years, and over 65 years. The dataset highlights the variation in household income, offering valuable insights into economic trends and disparities within different age categories, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income brackets:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Combine median household income by age. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Combine. The dataset can be utilized to gain insights into gender-based income distribution within the Combine population, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income brackets:
Variables / Data Columns
Employment type classifications include:
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Combine median household income by race. You can refer the same here
Facebook
TwitterName: Data used to rate the relevance of each dimension necessary for a Holistic Environmental Policy Assessment.
Summary: This dataset contains answers from a panel of experts and the public to rate the relevance of each dimension on a scale of 0 (Nor relevant at all) to 100 (Extremely relevant).
License: CC-BY-SA
Acknowledge: These data have been collected in the framework of the DECIPHER project. This project has received funding from the European Union’s Horizon Europe programme under grant agreement No. 101056898.
Disclaimer: Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.
Collection Date: 2024-1 / 2024-04
Publication Date: 22/04/2025
DOI: 10.5281/zenodo.13909413
Other repositories: -
Author: University of Deusto
Objective of collection: This data was originally collected to prioritise the dimensions to be further used for Environmental Policy Assessment and IAMs enlarged scope.
Description:
Data Files (CSV)
decipher-public.csv : Public participants' general survey results in the framework of the Decipher project, including socio demographic characteristics and overall perception of each dimension necessary for a Holistic Environmental Policy Assessment.
decipher-risk.csv : Contains individual survey responses regarding prioritisation of dimensions in risk situations. Includes demographic and opinion data from a targeted sample.
decipher-experts.csv : Experts’ opinions collected on risk topics through surveys in the framework of Decipher Project, targeting professionals in relevant fields.
decipher-modelers.csv: Answers given by the developers of models about the characteristics of the models and dimensions covered by them.
prolific_export_risk.csv : Exported survey data from Prolific, focusing specifically on ratings in risk situations. Includes response times, demographic details, and survey metadata.
prolific_export_public_{1,2}.csv : Public survey exports from Prolific, gathering prioritisation of dimensions necessary for environmental policy assessment.
curated.csv : Final cleaned and harmonized dataset combining multiple survey sources. Designed for direct statistical analysis with standardized variable names.
Scripts files (R)
decipher-modelers.R: Script to assess the answers given modelers about the characteristics of the models.
joint.R: Script to clean and joint the RAW answers from the different surveys to retrieve overall perception of each dimension necessary for a Holistic Environmental Policy Assessment.
Report Files
decipher-modelers.pdf: Diagram with the result of the
full-Country.html : Full interactive report showing dimension prioritisation broken down by participant country.
full-Gender.html : Visualization report displaying differences in dimension prioritisation by gender.
full-Education.html : Detailed breakdown of dimension prioritisation results based on education level.
full-Work.html : Report focusing on participant occupational categories and associated dimension prioritisation.
full-Income.html : Analysis report showing how income level correlates with dimension prioritisation.
full-PS.html : Report analyzing Political Sensitivity scores across all participants.
full-type.html : Visualization report comparing participant dimensions prioritisation (public vs experts) in normal and risk situations.
full-joint-Country.html : Joint analysis report integrating multiple dimensions of country-based dimension prioritisation in normal and risk situations. Combines demographic and response patterns.
full-joint-Gender.html : Combined gender-based analysis across datasets, exploring intersections of demographic factors and dimensions prioritisation in normal and risk situations.
full-joint-Education.html : Education-focused report merging various datasets to show consistent or divergent patterns of dimensions prioritisation in normal and risk awareness.
full-joint-Work.html : Cross-dataset analysis of occupational groups and their dimensions prioritisation in normal and risk situation
full-joint-Income.html : Income-stratified joint analysis, merging public and expert datasets to find common trends and significant differences during dimensions prioritisation in normal and risks situations.
full-joint-PS.html : Comprehensive Political Sensitivity score report from merged datasets, highlighting general patterns and subgroup variations in normal and risk situations.
5 star: ⭐⭐⭐
Preprocessing steps: The data has been re-coded and cleaned using the scripts provided.
Reuse: NA
Update policy: No more updates are planned.
Ethics and legal aspects: Names of the persons involved have been removed.
Technical aspects:
Other:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about vehicles currently available in GTA V. When merged together, you should have about 567 rows with complete data. While there are a total of 807 vehicles currently in GTA V, the webscraping script failed with some of the vehicle urls.
This dataset was retrieved via webscraping from gtabase.com. The data is publicly available to everyone.
You'll notice there are 6 CSV files. Here are the contents. When fully merged, there should be 567 rows and 36 columns. The files are separate due to the nature of the webscraping script. I'm still new to webscraping so I scraped the data in batches. Additionally, I forgot to scrape the upgrade_cost in the original script so I had to do that piece separately as well.
vehicle_links: url for each vehicle in GTA V. May contain duplicates.v1: Row ID. This can be dropped.title: Name of the vehicle.vehicle_class: Vehicle category (Planes, Utility, SUVs, Sports, Super, etc).manufacturer: Vehicle manufacturer.features: Vehicle features.acquisition: Method of obtaining the vehicle in game.price: Vehicle price.storage_location: Where the vehicle can be stored in game.delivery_method: How the vehicle is delivered in game.modifications: Where the vehicle can be modified in game.resale_flag: If the vehicle can be resold in game.resale_price: Resale price of vehicle. Contains 2 values, resale price and resale price when fully upgraded. race_availability: Whether the vehicle can be used in races.top_speed_in_game: Vehicle top speed in game. Contains values for MPH and KMH.based_on: The real life vehicle that this vehicle is based on.seats: Number of seats in the vehicle.weight_in_kg: Vehicle weight (KG).drive_train: Vehicle drivetrain.gears: Number of gears in the vehicle.release_date: Vehicle release date in game.release_dlc: Name of the DLC the vehicle was released in.top_speed_real: I believe this is the top speed of the real life vehicle, not the GTA V version.lap_time: Vehicle lap time in minutes and seconds, in game.bulletproof: If the vehicle is bulletproof or not.weapon1_resistance: Resistance to HOMING LAUNCHER / OPPRESSOR MISSILES / JET MISSILES.weapon2_resistance: Resistance to RPG / GRENADES / STICKY BOMB / MOC CANNONweapon3_resistance: Resistance to EXPLOSIVE ROUNDS (HEAVY SNIPER MK II)weapon4_resistance: Resistance to TANK CANNON (RHINO / APC)weapon5_resistance: Resistance to ANTI-AIRCRAFT TRAILER DUAL 20MM FLAKspeed: Vehicle speed score.acceleration: Vehicle acceleration score.braking: Vehicle braking score. handling: Vehicle handling score.overall: Vehicle overall score.vehicle_url: Vehicle url. ...1: Row ID. This can be dropped.upgrade_cost: Upgrade Cost for vehicle. vehicle_url: Vehicle url. The common common key in all datasets in the vehicle_url. This may also be called vehicle_link.
1. Merge the gta_data_batch csvs by rows.
2. Merge the gta_data_upgrade_cost csvs by rows.
3. Left join the gta_data_upgrade_cost to the ga_data_batch using the vehicle_url as the common key.
4. For any vehicle url in vehicle_links csv that does not have data in gta_data_batch or gta_data_upgrade_cost files, this would be on the urls that failed in the script.
The dataset will require some data cleaning. I decided NOT to clean the data before posting, to add an additional challenge. Some files may contain duplicates. Be sure to remove duplicates after merging. Some additional tips on data cleaning -
1. title: Remove the string pattern "GTA 5:"
2. acquisition: Remove the string pattern "/ found"
3. resale_price: Separate into 2 columns to get the normal resale price and the resale price when fully upgraded.
4. top_speed: Get rid of value for km/h, you only need the mph value.
5. upgrade_cost: Remove all non numeric elements.
5. numeric values: Remove all non numeric elements from columns that should be numeric. Convert to numeric.
6. Remove leading and trailing white spaces from columns as necessary.
This dataset can be used for exploratory data analysis on GTA V vehicles. Some ideas - - Counts: View counts of vehicles by vehicle class, manufacturer, release dlc, etc. - Resale Value: What vehicles have the best resale value after taking into account upgrade cost. - Speed: What are the best vehicles in terms of in game place and racing. - Price: What are the most expensive vehicles by vehicle class, manufacturer, etc. - Price: What is the distribution of vehicle prices by vehi...
Facebook
TwitterUnderstanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Calendar Year Dataset, 2023, is designed for analysts to conduct cross-sectional analysis for the 2023 calendar year. The Calendar Year datasets combine data collected in a specific year from across multiple waves and these are released as separate calendar year studies, with appropriate analysis weights, starting with the 2020 Calendar Year dataset. Each subsequent year, an additional yearly study is released.
The Calendar Year data is designed to enable timely cross-sectional analysis of individuals and households in a calendar year. Such analysis can however, only involve variables that are collected in every wave (excluding rotating content which is only collected in some of the waves). Due to overlapping fieldwork the data files combine data collected in the three waves that make up a calendar year. Analysis cannot be restricted to data collected in one wave during a calendar year, as this subset will not be representative of the population. Further details and guidance on this study can be found in the xxxx_main_survey_calendar_year_user_guide_2023.
These calendar year datasets should be used for cross-sectional analysis only. For those interested in longitudinal analyses using Understanding Society please access the main survey datasets: Safeguarded (End User Licence) version or Safeguarded/Special Licence version.
Understanding Society: the UK Household Longitudinal Study, started in 2009 with a general population sample (GPS) of UK residents living in private households of around 26,000 households and an ethnic minority boost sample (EMBS) of 4,000 households. All members of these responding households and their descendants became part of the core sample who were eligible to be interviewed every year. Anyone who joined these households after this initial wave, were also interviewed as long as they lived with these core sample members to provide the household context. At each annual interview, some basic demographic information was collected about every household member, information about the household is collected from one household member, all 16+ year old household members are eligible for adult interviews, 10-15 year old household members are eligible for youth interviews, and some information is collected about 0-9 year olds from their parents or guardians. Since 1991 until 2008/9 a similar survey, the British Household Panel Survey (BHPS), was fielded. The surviving members of this survey sample were incorporated into Understanding Society in 2010. In 2015, an immigrant and ethnic minority boost sample (IEMBS) of around 2,500 households was added. In 2022 a GPS boost sample (GPS2) of around 5,700 households was added. To know more about the sample design, following rules, interview modes, incentives, consent, questionnaire content please see the study overview and user guide.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
End User Licence and Special Licence versions:
There are two versions of the Calendar Year 2023 data. One is available under the standard End User Licence (EUL) agreement, and the other is a Special Licence (SL) version. The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see document '9471_eul_vs_sl_variable_differences' for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (Safeguarded (EUL)) and 6931 (Safeguarded/SL).
Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2023 dataset, subject to SL access conditions. See the User Guide for further details.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,800 variables.
Facebook
TwitterBy UCI [source]
This dataset provides an intimate look into student performance and engagement. It grants researchers access to numerous salient metrics of academic performance which illuminate a broad spectrum of student behaviors: how students interact with online learning material; quantitative indicators reflecting their academic outcomes; as well as demographic data such as age group, gender, prior education level among others.
The main objective of this dataset is to enable analysts and educators alike with empirical insights underpinning individualized learning experiences - specifically in identifying cases when students may be 'at risk'. Given that preventive early interventions have been shown to significantly mitigate chances of course or program withdrawal among struggling students - having accurate predictive measures such as this can greatly steer pedagogical strategies towards being more success oriented.
One unique feature about this dataset is its intricate detailing. Not only does it provide overarching summaries on a per-student basis for each presented courses but it also furnishes data related to assessments (scores & submission dates) along with information on individuals' interactions within VLEs (virtual learning environments) - spanning different types like forums, content pages etc... Such comprehensive collation across multiple contextual layers helps paint an encompassing portrayal of student experience that can guide better instructional design.
Due credit must be given when utilizing this database for research purposes through citation. Specifically referencing (Kuzilek et al., 2015) OU Analyse: Analysing At-Risk Students at The Open University published in Learning Analytics Review is required due to its seminal work related groundings regarding analysis methodologies stem from there.
Immaterial aspects aside - it is important to note that protection of student privacy is paramount within this dataset's terms and conditions. Stringent anonymization techniques have been implemented across sensitive variables - while detailed, profiles can't be traced back to original respondents.
How To Use This Dataset:
Understanding Your Objectives: Ideal objectives for using this dataset could be to identify at-risk students before they drop out of a class or program, improving course design by analyzing how assignments contribute to final grades, or simply examining relationships between different variables and student performance.
Set up your Analytical Environment: Before starting any analysis make sure you have an analytical environment set up where you can load the CSV files included in this dataset. You can use Python notebooks (Jupyter), R Studio or Tableau based software in case you want visual representation as well.
Explore Data Individually: There are seven separate datasets available: Assessments; Courses; Student Assessment; Student Info; Vle (Virtual Learning Environment); Student Registeration and Student Vle. Load these CSVs separately into your environment and do an initial exploration of each one: find out what kind of data they contain (numerical/categorical), if they have missing values etc.
Merge Datasets As the core idea is to track a student’s journey through multiple courses over time, combining these datasets will provide insights from wider perspectives. One way could be merging them using common key columns such as 'code_module', 'code_presentation', & 'id_student'. But make sure that merge should depend on what question you're trying to answer.
Identify Key Metrics Your key metrics will depend on your objectives but might include: overall grade averages per course or assessment type/student/region/gender/age group etc., number of clicks in virtual learning environment, student registration status etc.
Run Your Analysis Now you can run queries to analyze the data relevant to your objectives. Try questions like: What factors most strongly predict whether a student will fail an assessment? or How does course difficulty or the number of allotments per week change students' scores?
Visualization: Visualizing your data can be crucial for understanding patterns and relationships between variables. Use graphs like bar plots, heatmaps, and histograms to represent different aspects of your analyses.
Actionable Insights: The final step is interpreting these results in ways that are meaningf...