6 datasets found
  1. r

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • researchdata.edu.au
    Updated Apr 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg (2019). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    Dataset updated
    Apr 1, 2019
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Publication


    Primahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387

    Description of R codes and data files in the repository

    This repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).

    The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).

    These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt.

    Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.

    Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).

    The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  2. d

    CLM16gwl NSW Office of Water_GW licence extract linked to spatial...

    • data.gov.au
    • cloud.csiss.gmu.edu
    • +2more
    Updated Nov 19, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2019). CLM16gwl NSW Office of Water_GW licence extract linked to spatial locations_CLM_v3_13032014 [Dataset]. https://data.gov.au/data/dataset/activity/4b0e74ed-2fad-4608-a743-92163e13c30d
    Explore at:
    Dataset updated
    Nov 19, 2019
    Dataset provided by
    Bioregional Assessment Program
    Area covered
    New South Wales
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.

    The difference between NSW Office of Water GW licences - CLM v2 and v3 is that an additional column has been added, 'Asset Class' that aggregates the purpose of the licence into the set classes for the Asset Database. Also the 'Completed_Depth' has been added, which is the total depth of the groundwater bore. These columns were added for the purpose of the Asset Register.

    The aim of this dataset was to be able to map each groundwater works with the volumetric entitlement without double counting the volume and to aggregate/ disaggregate the data depending on the final use.

    This has not been clipped to the CLM PAE, therefore the number of economic assets/ relevant licences will drastically reduce once this occurs.

    The Clarence Moreton groundwater licences includes an extract of all licences that fell within the data management acquisition area as provided by BA to NSW Office of Water.

    Aim: To get a one to one ratio of licences numbers to bore IDs.

    Important notes about data:

    Data has not been clipped to the PAE.

    No decision have been made in regards to what purpose of groundwater should be protected. Therefore the purpose currently includes groundwater bores that have been drilled for non-extractive purposes including experimental research, test, monitoring bore, teaching, mineral explore and groundwater explore

    No volume has been included for domestic & stock as it is a basic right. Therefore an arbitrary volume could be applied to account for D&S use.

    Licence Number - Each sheet in the Original Data has a licence number, this is assumed to be the actual licence number. Some are old because they have not been updated to the new WA. Some are new (From_Spreadsheet_WALs). This is the reason for the different codes.

    WA/CA - This number is the 'works' number. It is assumed that the number indicates the bore permit or works approval. This is why there can be multiple works to licence and licences to works number. (For complete glossary see here http://registers.water.nsw.gov.au/wma/Glossary.jsp). Originally, the aim was to make sure that the when there was more than more than one licence to works number or mulitple works to licenes that the mulitple instances were compelte.

    Clarence Moreton worksheet links the individual licence to a works and a volumetric entitlement. For most sites, this can be linked to a bore which can be found in the NGIS through the HydroID. (\wron\Project\BA\BA_all\Hydrogeology_National_Groundwater_Information_System_v1.1_Sept2013). This will allow analysis of depths, lithology and hydrostratigraphy where the data exists.

    We can aggregate the data based on water source and water management zone as can be seen in the other worksheets.

    Data available:

    Original Data: Any data that was bought in from NSW Offcie of Water, includes

    Spatial locations provided by NoW- This is a exported data from the submitted shape files. Includes the licence (LICENCE) numbers and the bore ID (WORK_NUO). (Refer to lineage NSW Office of Water Groundwater Entitlements Spatial Locations).

    Spreadsheet_WAL - The spread sheet from the submitted data, WLS-EXTRACT_WALs_volume. (Refer to Lineage NSW Office of Water Groundwater Licence Extract CLM- Oct 2013)

    WLS_extracts - The combined spread sheets from the submitted data, WLS-EXTRACT . (Refer to Lineage NSW Office of Water Groundwater Licence Extract CLM- Oct 2013)

    Aggregated share component to water sharing plan, water source and water management zone

    Dataset History

    The difference between NSW Office of Water GW licences - CLM v2 and v3 is that an additional column has been added, 'Asset Class' that aggregates the purpose of the licence into the set classes for the Asset Database.

    Where purpose = domestic; or domestic & stock; or stock then it was classed as 'basic water right'. Where it is listed as both a domestic/stock and a licensed use such as irrigation, it was classed as a 'water access right.' All other take and use were classed as a 'water access right'. Where purpose = drainage, waste disposal, groundwater remediation, experimental research, null, conveyancing, test bore - these were not given an asset class. Monitoring bores were classed as 'Water supply and monitoring infrastructure'

    Depth has also been included which is the completed depth of the bore.

    Instructions

    Procedure: refer to Bioregional assessment data conversion script.docx

    1) Original spread sheets have mulitple licence instances if there are more than one WA/CA number. This means that there are more than one works or permit to the licence. The aim is to only have one licence instance.

    2) The individual licence numbers were combined into one column

    3) Using the new column of licence numbers, several vlookups were created to bring in other data. Where the columns are identical in the original spreadsheets, they are combined. The only ones that don't are the Share/Entitlement/allocation, these mean different things.

    4) A hydro ID column was created, this is a code that links this NSW to the NGIS, which is basically a ".1.1" at the end of the bore code.

    5) All 'cancelled' licences were removed

    6) A count of the number of works per licence and number of bores were included in the spreadsheet.

    7) Where the ShareComponent = NA, the Entitlement = 0, Allocation = 0 and there was more than one instance of the same bore, this means that the original licence assigned to the bore has been replaced by a new licence with a share component. Where these criteria were met, the instances were removed

    8) a volume per works ensures that the volume of the licence is not repeated for each works, but is divided by the number of works

    Bioregional assessment data conversion script

    Aim: The following document is the R-Studio script for the conversion and merging of the bioregional assessment data.

    Requirements: The user will need R-Studio. It would be recommended that there is some basic knowledge of R. If there isn't, the only thing that would really need to be changed is the file location and name. The way that R reads files is different to windows and also the locations that R-Studio read is dependent on where R-Studio is originally installed to point. This would need to be completed properly before the script can be run.

    Procedure: The information below the dashed line is the script. This can be copied and pasted directly into R-Studio. Any text with '#' will not be read as a script, so that can be added in and read as an instruction.

    ###########

    # 18/2/2014

    # Code by Brendan Dimech

    #

    # Script to merge extract files from submitted NSW bioregional

    # assessment and convert data into required format. Also use a 'vlookup'

    # process to get Bore and Location information from NGIS.

    #

    # There are 3 scripts, one for each of the individual regions.

    #

    ############

    # CLARENCE MORTON

    # Opening of files. Location can be changed if needed.

    # arc.file is the exported *.csv from the NGIS data which has bore data and Lat/long information.

    # Lat/long weren't in the file natively so were added to the table using Arc Toolbox tools.

    arc.folder = '/data/cdc_cwd_wra/awra/wra_share_01/GW_licencing_and_use_data/Rstudio/Data/Vlookup/Data'

    arc.file = "Moreton.csv"

    # Files from NSW came through in two types. WALS files, this included 'newer' licences that had a share component.

    # The 'OTH' files were older licences that had just an allocation. Some data was similar and this was combined,

    # and other information that wasn't similar from the datasets was removed.

    # This section is locating and importing the WALS and OTH files.

    WALS.folder = '/data/cdc_cwd_wra/awra/wra_share_01/GW_licencing_and_use_data/Rstudio/Data/Vlookup/Data'

    WALS.file = "GW_Clarence_Moreton_WLS-EXTRACT_4_WALs_volume.xls"

    OTH.file.1 = "GW_Clarence_Moreton_WLS-EXTRACT_1.xls"

    OTH.file.2 = "GW_Clarence_Moreton_WLS-EXTRACT_2.xls"

    OTH.file.3 = "GW_Clarence_Moreton_WLS-EXTRACT_3.xls"

    OTH.file.4 = "GW_Clarence_Moreton_WLS-EXTRACT_4.xls"

    newWALS.folder = '/data/cdc_cwd_wra/awra/wra_share_01/GW_licencing_and_use_data/Rstudio/Data/Vlookup/Products'

    newWALS.file = "Clarence_Moreton.csv"

    arc <- read.csv(paste(arc.folder, arc.file, sep="/" ), header =TRUE, sep = ",")

    WALS <- read.table(paste(WALS.folder, WALS.file, sep="/" ), header =TRUE, sep = "\t")

    # Merge any individual WALS and OTH files into a single WALS or OTH file if there were more than one.

    OTH1 <- read.table(paste(WALS.folder, OTH.file.1, sep="/" ), header =TRUE, sep = "\t")

    OTH2 <- read.table(paste(WALS.folder, OTH.file.2, sep="/" ), header =TRUE, sep = "\t")

    OTH3 <- read.table(paste(WALS.folder, OTH.file.3, sep="/" ), header =TRUE, sep = "\t")

    OTH4 <- read.table(paste(WALS.folder, OTH.file.4, sep="/" ), header =TRUE, sep = "\t")

    OTH <- merge(OTH1,OTH2, all.y = TRUE, all.x = TRUE)

    OTH <- merge(OTH,OTH3, all.y = TRUE, all.x = TRUE)

    OTH <- merge(OTH,OTH4, all.y = TRUE, all.x = TRUE)

    # Add new columns to OTH for the BORE, LAT and LONG. Then use 'merge' as a vlookup to add the corresponding

    # bore and location from the arc file. The WALS and OTH files are slightly different because the arc file has

    # a different licence number added in.

    OTH <- data.frame(OTH, BORE = "", LAT = "", LONG = "")

    OTH$BORE <- arc$WORK_NO[match(OTH$LICENSE.APPROVAL, arc$LICENSE)]

    OTH$LAT <- arc$POINT_X[match(OTH$LICENSE.APPROVAL, arc$LICENSE)]

    OTH$LONG <-

  3. n

    Processed data for the analysis of human mobility changes from COVID-19...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jin Bai; Michael Caslin; Madhusudan Katti (2024). Processed data for the analysis of human mobility changes from COVID-19 lockdown on bird occupancy in North Carolina, USA [Dataset]. http://doi.org/10.5061/dryad.gb5mkkwxr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    North Carolina State University
    Authors
    Jin Bai; Michael Caslin; Madhusudan Katti
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    United States, North Carolina
    Description

    The COVID-19 pandemic lockdown worldwide provided a unique research opportunity for ecologists to investigate the human-wildlife relationship under abrupt changes in human mobility, also known as Anthropause. Here we chose 15 common non-migratory bird species with different levels of synanthrope and we aimed to compare how human mobility changes could influence the occupancy of fully synanthropic species such as House Sparrow (Passer domesticus) versus casual to tangential synanthropic species such as White-breasted Nuthatch (Sitta carolinensis). We extracted data from the eBird citizen science project during three study periods in the spring and summer of 2020 when human mobility changed unevenly across different counties in North Carolina. We used the COVID-19 Community Mobility reports from Google to examine how community mobility changes towards workplaces, an indicator of overall human movements at the county level, could influence bird occupancy. Methods The data source we used for bird data was eBird, a global citizen science project run by the Cornell Lab of Ornithology. We used the COVID-19 Community Mobility Reports by Google to represent the pause of human activities at the county level in North Carolina. These data are publicly available and were last updated on 10/15/2022. We used forest land cover data from NC One Map that has a high resolution (1-meter pixel) raster data from 2016 imagery to represent canopy cover at each eBird checklist location. We also used the raster data of the 2019 National Land Cover Database to represent the degree of development/impervious surface at each eBird checklist location. All three measurements were used for the highest resolution that was available to use. We downloaded the eBird Basic Dataset (EBD) that contains the 15 study species from February to June 2020. We also downloaded the sampling event data that contains the checklist efforts information. First, we used the R package Auk (version 0.6.0) in R (version 4.2.1) to filter data in the following conditions: (1) Date: 02/19/2020 - 03/29/2020; (2) Checklist type: stationary; (3) Complete checklist; (4) Time: 07:00 am - 06:00 pm; (5) Checklist duration: 5-20 mins; (6) Location: North Carolina. After filtering data, we used the zero fill function from Auk to create detection/non-detection data of each study species in NC. Then we used the repeat visits filter from Auk to filter eBird checklist locations where at least 2 checklists (max 10 checklists) have been submitted to the same location by the same observer, allowing us to create a hierarchical data frame where both detection and state process can be analyzed using Occupancy Modeling. This data frame was in a matrix format that each row represents a sampling location and the columns represent the detection and non-detection of the 2-10 repeat sampling events. For the Google Community Mobility data, we chose the “Workplaces” categoriy of mobility data to analyze the Anthropause effect because it was highly relevant to the pause of human activities in urban areas. The mobility data from Google is a percentage change compared to a baseline for each day. A baseline day represents a normal value for the day of the week from the 5-week period (01/03/2020-02/06/2020). For example, a mobility value of -30.0 for Wake County on Apr 15, 2020, means the overall mobility in Wake County on that day decreased by 30% compared to the baseline day a few months ago. Because the eBird data we used covers a wider range of dates rather than each day, we took the average value of mobility before lockdown, during lockdown, and after lockdown in each county in NC. For the environmental variables, we calculated the values in ArcGIS Pro (version 3.1.0). We created a 200 m buffer at each eligible eBird checklist location. For the forest cover data, we used “Zonal Statistics as Table” to extract the percentage of forest cover at each checklist location’s 200-meter circular buffer. For the National Land Cover Database (NLCD) data, we combined low-intensity, medium-intensity, and high-intensity development as development covers and used “Summarize Within” to extract the percentage of development cover using the polygon version of NLCD. We used a correlation matrix of the three predictors (workplace mobility, percent forest cover, and percent development cover) and found no co-linearity. Thus, these three predictors plus the interaction between workplace mobility and percent development cover were the site covariates of the Occupancy Models. For the detection covariates, four predictors were considered including time of observation, checklist duration, number of observers, and workplace mobility. These detection covariates were also not highly correlated. We then merged all data into an unmarked data frame using the “unmarked” R package (version 1.2.5). The unmarked data frame has eBird sampling locations as sites (rows in the data frame) and repeat checklists at the same sampling locations as repeat visits (columns in the data frame).

  4. [Superseded] Intellectual Property Government Open Data 2019

    • researchdata.edu.au
    • data.gov.au
    Updated Jun 6, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2019). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://researchdata.edu.au/superseded-intellectual-property-data-2019/2994670
    Explore at:
    Dataset updated
    Jun 6, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    IP Australia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is IPGOD?\r

    The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.\r \r \r

    How do I use IPGOD?\r

    IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.\r \r \r

    IP Data Platform\r

    IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform\r \r

    References\r

    \r The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.\r \r * Patents\r * Trade Marks\r * Designs\r * Plant Breeder’s Rights\r \r \r

    Updates\r

    \r

    Tables and columns\r

    \r Due to the changes in our systems, some tables have been affected.\r \r * We have added IPGOD 225 and IPGOD 325 to the dataset!\r * The IPGOD 206 table is not available this year.\r * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.\r \r

    Data quality improvements\r

    \r Data quality has been improved across all tables.\r \r * Null values are simply empty rather than '31/12/9999'.\r * All date columns are now in ISO format 'yyyy-mm-dd'.\r * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.\r * All tables are encoded in UTF-8.\r * All tables use the backslash \ as the escape character.\r * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  5. g

    First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    First quarter 2024 / Table BOAMP-SIREN-BUYERS (BSA): a cross between the BOAMP table (DILA) and the Sirene Business Base (INSEE) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6644ac7663969d80f6047dd8/
    Explore at:
    Description

    Crossing table of the BOAMP table (DILA) with the Sirene Business Base (INSEE) / First Quarter 2024. - The BUYER's Siren number (column "SN_30_Siren") is implemented for each ad (column and primary key "B_17_idweb"); - Several columns facilitating datamining have been added; - The names of the original columns have been prefixed, numbered and sorted alphabetically. ---- You will find here - The BSA for the first quarter of 2024 in free and open access (csv/separator ** semicolon** formats, and Public Prosecutor’s Office); - The schema of the BSA table (csv/comma separator format); - An excerpt from the March 30 BSA (csv/comma separator format) to quickly give you an idea of the Datagouv explorer. NB / The March 30 extract sees its columns of cells in json GESTION, DATA, and ANNONCES_ANTERIEURES purged. The data thus deleted can be found in a nicer format by following the links of the added columns: - B_41_GESTION_URL_JSON; - B_43_DONNEES_URL_JSON; - B_45_ANNONCES_ANTERIEURES_URL_JSON. ---- More info - Daily and paid updates on the entire BOAMP 2024 are available on our website under ► AuFilDuBoamp Downloads; - Further documentation can be found at ► AuFilDuBoamp Doc & TP. ---- Data sources - SIRENE database of companies and their establishments (SIREN, SIRET) of August - BOAMP API ---- To download the first quarter of the BSA with Python, run: For the CSV: df = pd.read_csv("https://www.data.gouv.fr/en/datasets/r/63f0d792-148a-4c95-a0b6-9e8ea8b0b34a", dtype='string', sep=';') For the Public Prosecutor's Office: df = pd.read_parquet("https://www.data.gouv.fr/en/datasets/r/f7a4a76e-ff50-4dc6-bae8-97368081add2") Enjoy! https://www.aufilduboamp.com/shares/aufilduboamp_docs/ap_tampon_blanc.jpg" alt="www.aufilduboamp.com and BOAMP data on datagouv" title="www.aufilduboamp.com and BOAMP data on datagouv">

  6. Virtual Reality Balance Disturbance Dataset

    • zenodo.org
    bin
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuno Ferrete Ribeiro; Nuno Ferrete Ribeiro; Henrique Pires; Cristina P. Santos; Cristina P. Santos; Henrique Pires (2024). Virtual Reality Balance Disturbance Dataset [Dataset]. http://doi.org/10.5281/zenodo.14013468
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nuno Ferrete Ribeiro; Nuno Ferrete Ribeiro; Henrique Pires; Cristina P. Santos; Cristina P. Santos; Henrique Pires
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background and Purpose:

    There are very few publicly available datasets on real-world falls in scientific literature due to the lack of natural falls and the inherent difficulties in gathering biomechanical and physiological data from young subjects or older adults residing in their communities in a non-intrusive and user-friendly manner. This data gap hindered research on fall prevention strategies. Immersive Virtual Reality (VR) environments provide a unique solution.

    This dataset supports research in fall prevention by providing an immersive VR setup that simulates diverse ecological environments and randomized visual disturbances, aimed at triggering and analyzing balance-compensatory reactions. The dataset is a unique tool for studying human balance responses to VR-induced perturbations, facilitating research that could inform training programs, wearable assistive technologies, and VR-based rehabilitation methods.

    Dataset Content:
    The dataset includes:

    • Kinematic Data: Captured using a full-body Xsens MVN Awinda inertial measurement system, providing detailed movement data at 60 Hz.
    • Muscle Activity (EMG): Recorded at 1111 Hz using Delsys Trigno for tracking muscle contractions.
    • Electrodermal Activity (EDA)*: Captured at 100.21 Hz with a Shimmer GSR device on the dominant forearm to record physiological responses to perturbations.
    • Metadata: Includes participant demographics (age, height, weight, gender, dominant hand and foot), trial conditions, and perturbation characteristics (timing and type).

    The files are named in the format "ParticipantX_labelled", where X represents the participant's number. Each file is provided in a .mat format, with data already synchronized across different sensor sources. The structure of each file is organized into the following columns:

    • Column 1: Label indicating the visual perturbation applied. 0 means no visual perturbation.
    • Column 2: Timestamp, providing the precise timing of each recorded data point.
    • Column 3: Frame identifier, which can be cross-referenced with the MVN file for detailed motion analysis.
    • Columns 4 to 985: Xsens motion capture features, exported directly from the MVN file.
    • Columns 986 to 993: EMG data - Tibialis Anterior (R&L), Gastrocnemius Medial Head (R&L), Rectus Femoris (R), Semitendinosus (R), External Oblique (R), Sternocleidomastoid (R).
    • Columns 994 to 1008: Shimmer data: Accelerometer (x,y,z), Gyroscope (x,y,z), Magnetometer (x,y,z), GSR Range, Skin Conductance, Skin Resistance, PPG, Pressure, Temperature.

    In addition, we are also releasing the .MVN and .MVNA files for each participant (1 to 10), which provide comprehensive motion capture data and include the participants' body measurements, respectively. This additional data enables precise body modeling and further in-depth biomechanical analysis.

    Participants & VR Headset:

    Twelve healthy young adults (average age: 25.09 ± 2.81 years; height: 167.82 ± 8.40 cm; weight: 64.83 ± 7.77 kg; 6 males, 6 females) participated in this study (Table 1). Participants met the following criteria: i) healthy locomotion, ii) stable postural balance, iii) age ≥ 18 years, and iv) body weight < 135 kg.

    Participants were excluded if they: i) had any condition affecting locomotion, ii) had epilepsy, vestibular disorders, or other neurological conditions impacting stability, iii) had undergone recent surgeries impacting mobility, iv) were involved in other experimental studies, v) were under judicial protection or guardianship, or vi) experienced complications using VR headsets (e.g., motion sickness).

    All participants provided written informed consent, adhering to the ethical guidelines set by the University of Minho Ethics Committee (CEICVS 063/2021), in compliance with the Declaration of Helsinki and the Oviedo Convention.

    To ensure unbiased reactions, participants were kept unaware of the specific protocol details. Visual disturbances were introduced in a random sequence and at various locations, enhancing the unpredictability of the experiment and simulating a naturalistic response.

    The VR setup involved an HTC Vive Pro headset with two wirelessly synchronized base stations that tracked participants’ head movements within a 5m x 2.5m area. The base stations adjusted the VR environment’s perspective according to head movements, while controllers were used solely for setup purposes.

    Table 1 - Participants' demographic information

    ParticipantHeight (cm)Weight (kg)AgeGenderDom. HandDom. Foot
    115956.523FRightRight
    215755.328FRightRight
    317467.131MRightRight
    417673.823MRightRight
    515857.323FRightRight
    618170.927MRightRight
    717173.323MRightRight
    815969.228FRightRight
    917757.322MRightRight
    1017175.525MRightRight
    1116358.123FRightRight
    1216863.725FRightRight

    Data Collection Methodology:

    The experimental protocol was designed to integrate four essential components: (i) precise control over stimuli, (ii) high reproducibility of the experimental conditions, (iii) preservation of ecological validity, and (iv) promotion of real-world learning transfer.

    • Participant Instructions and Familiarization Trial: Before starting, participants were given specific instructions to (i) seek assistance if they experienced motion sickness, (ii) adjust the VR headset for comfort by modifying the lens distance and headset fit, (iii) stay within the defined virtual play area demarcated by a blue boundary, and (iv) complete a familiarization trial. During this trial, participants were encouraged to explore various virtual environments while performing a sequence of three key movements—walking forward, turning around, and returning to the initial location—without any visual perturbations. This familiarization phase helped participants acclimate to the virtual space in a controlled setting.
    • Experimental Protocol and Visual Perturbations: Participants were exposed to 11 different types of visual perturbations as outlined in Table 2, applied across a total of 35 unique perturbation variants (Table 3). Each variant involved the same type of perturbation, such as a clockwise Roll Axis Tilt, but varied in intensity (e.g., rotation speed) and was presented in randomized virtual locations. The selection of perturbation types was grounded in existing literature on visual disturbances. This design ensured that participants experienced a diverse range of visual effects in a manner that maintained ecological validity, supporting the potential for generalization to real-world scenarios where visual perturbations might occur spontaneously.
    • Protocol Flow and Randomized Presentation: Throughout the experimental protocol, each visual perturbation variant was presented three times, and participants engaged repeatedly in the familiarization activities over a nearly one-hour period. These activities—walking forward, turning around, and returning to the starting point—took place in a 5m x 2.5m physical space mirrored in VR, allowing participants to take 7–10 steps before turning. Participants were not informed of the timing or nature of any perturbations, which could occur unpredictably during their forward walk, adding a realistic element of surprise. After each return to the starting point, participants were relocated to a random position within the virtual environment, with the sequence of positions determined by a randomized, computer-generated order.

    Table 2 - Visual perturbations' name and parameters (L - Lateral; B - Backward; F - Forward; S - Slip; T - Trip; CW- Clockwise; CCW - Counter-Clockwise)

    Perturbation [Fall Category]

    Parameters

    Roll Axis Tilt - CW [L][10º, 20º, 30º] during 0.5s
    Roll Axis Tilt – CCW [L][10º, 20º, 30º] during 0.5s
    Support Surface ML Axis Translation - Bidirectional [L]Discrete Movement (static pauses between movements) – 1

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg (2019). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768

R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart

Explore at:
Dataset updated
Apr 1, 2019
Dataset provided by
Monash University
Authors
Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Publication


Primahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387

Description of R codes and data files in the repository

This repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).

The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).

These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt.

Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.

Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).

The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

Search
Clear search
Close search
Google apps
Main menu