12 datasets found
  1. d

    Replication Data for Exploring an extinct society through the lens of...

    • dataone.org
    • dataverse.harvard.edu
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wieczorek, Oliver; Malzahn, Melanie (2023). Replication Data for Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus [Dataset]. http://doi.org/10.7910/DVN/UF8DHK
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Wieczorek, Oliver; Malzahn, Melanie
    Description

    The files and workflow will allow you to replicate the study titled "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus". This study aimed at utilizing the CEToM-corpus (https://cetom.univie.ac.at/) (Tocharian) to analyze the life-world of the elites of an extinct society situated in modern eastern China. To acquire the raw data needed for steps 1 & 2, please contact Melanie Malzahn melanie.malzahn@univie.ac.at. We conducted a mixed methods study, containing of close reading, content analysis, and multiple correspondence analysis (MCA). The excel file titled "fragments_architecture_combined.xlsx" allows for replication of the MCA and equates to the third step of the workflow outlined below. We used the following programming languages and packages to prepare the dataset and to analyze the data. Data preparation and merging procedures were achieved in python (version 3.9.10) with packages pandas (version 1.5.3), os (version 3.12.0), re (version 3.12.0), numpy (version 1.24.3), gensim (version 4.3.1), BeautifulSoup4 (version 4.12.2), pyasn1 (version 0.4.8), and langdetect (version 1.0.9). Multiple Correspondence Analyses were conducted in R (version 4.3.2) with the packages FactoMineR (version 2.9), factoextra (version 1.0.7), readxl version(1.4.3), tidyverse version(2.0.0), ggplot2 (version 3.4.4) and psych (version 2.3.9). After requesting the necessary files, please open the scripts in the order outlined bellow and execute the code-files to replicate the analysis: Preparatory step: Create a folder for the python and r-scripts downloadable in this repository. Open the file 0_create folders.py and declare a root folder in line 19. This first script will generate you the following folders: "tarim-brahmi_database" = Folder, which contains tocharian dictionaries and tocharian text fragments. "dictionaries" = contains tocharian A and tocharian B vocabularies, including linguistic features such as translations, meanings, part of speech tags etc. A full overview of the words is provided on https://cetom.univie.ac.at/?words. "fragments" = contains tocharian text fragments as xml-files. "word_corpus_data" = folder will contain excel-files of the corpus data after the first step. "Architectural_terms" = This folder contains the data on the architectural terms used in the dataset (e.g. dwelling, house). "regional_data" = This folder contains the data on the findsports (tocharian and modern chinese equivalent, e.g. Duldur-Akhur & Kucha). "mca_ready_data" = This is the folder, in which the excel-file with the merged data will be saved. Note that the prepared file named "fragments_architecture_combined.xlsx" can be saved into this directory. This allows you to skip steps 1 &2 and reproduce the MCA of the content analysis based on the third step of our workflow (R-Script titled 3_conduct_MCA.R). First step - run 1_read_xml-files.py: loops over the xml-files in folder dictionaries and identifies a) word metadata, including language (Tocharian A or B), keywords, part of speech, lemmata, word etymology, and loan sources. Then, it loops over the xml-textfiles and extracts a text id number, langauge (Tocharian A or B), text title, text genre, text subgenre, prose type, verse type, material on which the text is written, medium, findspot, the source text in tocharian, and the translation where available. After successful feature extraction, the resulting pandas dataframe object is exported to the word_corpus_data folder. Second step - run 2_merge_excel_files.py: merges all excel files (corpus, data on findspots, word data) and reproduces the content analysis, which was based upon close reading in the first place. Third step - run 3_conduct_MCA.R: recodes, prepares, and selects the variables necessary to conduct the MCA. Then produces the descriptive values, before conducitng the MCA, identifying typical texts per dimension, and exporting the png-files uploaded to this repository.

  2. [Superseded] Intellectual Property Government Open Data 2019

    • demo.dev.magda.io
    csv-geo-au, pdf
    Updated Jan 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2022). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-a4210de2-9cbb-4d43-848d-46138fefd271
    Explore at:
    csv-geo-au, pdfAvailable download formats
    Dataset updated
    Jan 26, 2022
    Dataset provided by
    IP Australiahttp://ipaustralia.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has …Show full descriptionWhat is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. Patents Trade Marks Designs Plant Breeder’s Rights Updates Tables and columns Due to the changes in our systems, some tables have been affected. We have added IPGOD 225 and IPGOD 325 to the dataset! The IPGOD 206 table is not available this year. Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. Data quality improvements Data quality has been improved across all tables. Null values are simply empty rather than '31/12/9999'. All date columns are now in ISO format 'yyyy-mm-dd'. All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. All tables are encoded in UTF-8. All tables use the backslash \ as the escape character. The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  3. Data from: DATASET FOR: A multimodal spectroscopic approach combining...

    • zenodo.org
    • producciocientifica.uv.es
    bin, csv, zip
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Perez Guaita; David Perez Guaita (2024). DATASET FOR: A multimodal spectroscopic approach combining mid-infrared and near-infrared for discriminating Gram-positive and Gram-negative bacteria [Dataset]. http://doi.org/10.5281/zenodo.10523185
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Perez Guaita; David Perez Guaita
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:

    This dataset comprises a comprehensive set of files designed for the analysis and 2D correlation of spectral data, specifically focusing on ATR and NIR spectra. It includes MATLAB scripts and supporting functions necessary to replicate the analysis, as well as the raw datasets used in the study. Below is a detailed description of the included files:

    1. Data Analysis:

      • File Name: Data_Analysis.mlx
      • Description: This MATLAB Live Script file contains the main script used for the classification analysis of the spectral data. It includes steps for preprocessing, analysis, and visualization of the ATR and NIR spectra.
    2. 2D Correlation Data Analysis:

      • File Name: Data_Analysis_2Dcorr.mlx
      • Description: This MATLAB Live Script file is similar to the primary analysis script but is specifically tailored for performing 2D correlation analysis on the spectral data. It includes detailed steps and code for executing the 2D correlation.
    3. Functions:

      • Folder Name: Functions
      • Description: This folder contains all the necessary MATLAB function files required to replicate the analyses presented in the scripts. These functions handle various preprocessing steps, calculations, and visualizations.
    4. Datasets:

      • File Names: ATR_dataset.xlsx, NIR_dataset.xlsx, Reference_data.csv
      • Description: These Excel files contain the raw spectral data for ATR and NIR analyses, as well as reference datasets. Each file includes multiple sheets with detailed measurements and metadata.

    Usage Notes:

    • Software Requirements:
      • MATLAB is required to run the .mlx files and utilize the functions.
      • PLS_Toolbox: Necessary for certain preprocessing and analysis steps.
      • MIDAS 2010: Available at MIDAS 2010, required for the 2D correlation analysis.
    • Replication: Users can replicate the analyses by running the Data_Analysis.mlx and Data_Analysis_2Dcorr.mlx scripts in MATLAB, ensuring that the Functions folder is in the MATLAB path.
    • Data Handling: The datasets are provided in .xlsx format, which can be easily imported into MATLAB or other data analysis software.
  4. H

    2000 Population Census Data Assembly, ANHUI Province; 安徽省2000人口普查统计资料汇编

    • dataverse.harvard.edu
    Updated Oct 30, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2008). 2000 Population Census Data Assembly, ANHUI Province; 安徽省2000人口普查统计资料汇编 [Dataset]. http://doi.org/10.7910/DVN/9EJO5F
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 30, 2008
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Anhui
    Description

    The data files in this study have been created by merging individual per-county census files available as Microsoft Excel spreadsheets. Thus the study contains 111 data files, 1 file per census category. Each of these files contains merged census data for the following counties: DONGSHI 东市区 (340102), ZHONGSHI 中市区 (340103), XISHI 西市区 (340104), HEFEI JIAOQU 合肥郊区 (340111), CHANGFENG 长丰县 (340121), FEIDONG 肥东县 (340122), FEIXI 肥西县 (340123), JINGHU 镜湖区 (340202), MATANG 马塘区 (340203), XINWU 新芜区 (340204), JIUJIANG 鸠江区 (340207), WUHU 芜湖县 (340221), FANCHANG 繁昌县 (340222), NANLING 南陵县 (340223), DONGSHI 东市区 (340302), ZHONGSHI 中市区 (340303), XISHI 西市区 (340304), BANGBU JIAOQU 蚌埠市郊区 (340311), HUAIYUAN 怀远县 (340321), WUHE 五河县 (340322), GUZHEN 固镇县 (340323), DATONG 大通区 (340402), TIANJIAAN 田家庵区 (340403), XIEJIAJI 谢家集区 (340404), BAGONGSHAN 八公山区 (340405), PANJI 潘集区 (340406), FENGTAI 凤台县 (340421), JINJIAZHUANG 金家庄区 (340502), HUASHAN 花山区 (340503), YUSHAN 雨山区 (340504), XIANGSHAN 向山区 (340505), DANGTU 当涂县 (340521), DUJI 杜集区 (340602), XIANGSHAN 相山区 (340603), LIESHAN 烈山区 (340604), SUIXI 濉溪县 (340621), TONGGUANSHAN 铜官山区 (340702), SHIZISHAN 狮子山区 (340703), TONGLING JIAOQU 铜陵市郊区 (340711), TONGLING 铜陵县 (340721), YINGJIANG 迎江区 (340802), DAGUAN 大观区 (340803), ANQING JIAOQU 安庆市郊区 (340811), HUAINING 怀宁县 (340822), ZONGYANG 枞阳县 (340823), QIANSHAN 潜山县 (340824), TAIHU 太湖县 (340825), SUSONG 宿松县 (340826), WANGJIANG 望江县 (340827), YUEXI 岳西县 (340828), TONGCHENG 桐城市 (340881), TUNXI 屯溪区 (341002), HUANGSHAN 黄山区 (341003), HUIZHOU 徽州区 (341004), SHEXIAN 歙县 (341021), XIUNING 休宁县 (341022), YIXIAN 黟县 (341023), QIMEN 祁门县 (341024), LANGYA 琅琊区 (341102), NANQIAO 南谯区 (341103), LAIAN 来安县 (341122), QUANJIAO 全椒县 (341124), DINGYUAN 定远县 (341125), FENGYANG 凤阳县 (341126), TIANCHANG 天长市 (341181), MINGGUANG 明光市 (341182), YINGZHOU 颍州区 (341202), YINGDONG 颍东区 (341203), YINGQUAN 颍泉区 (341204), LINQUAN 临泉县 (341221), TAIHE 太和县 (341222), FUNAN 阜南县 (341225), YINGSHANG 颍上县 (341226), JIESHOU 界首市 (341282), YONGQIAO 埇桥区 (341302), DANGSHAN 砀山县 (341321), XIAOXIAN 萧县 (341322), LINGBI 灵璧县 (341323), SIXIAN 泗县 (341324), JUCHAO 居巢区 (341402), LUJIANG 庐江县 (341421), WUWEI 无为县 (341422), HANSHAN 含山县 (341423), HEXIAN 和县 (341424), JINAN 金安区 (341502), YUAN 裕安区 (341503), SHOUXIAN 寿县 (341521), HUOQIU 霍邱县 (341522), SHUCHENG 舒城县 (341523), JINZHAI 金寨县 (341524), HUOSHAN 霍山县 (341525), QIAOCHENG 谯城区 (341602), GUOYANG 涡阳县 (341621), MENGCHENG 蒙城县 (341622), LIXIN 利辛县 (341623), GUICHI 贵池区 (341702), DONGZHI 东至县 (341721), SHITAI 石台县 (341722), QINGYANG 青阳县 (341723), XUANZHOU 宣州区 (341802), LANGXI 郎溪县 (341821), GUANGDE 广德县 (341822), JINGXIAN 泾县 (341823), JIXI 绩溪县 (341824), JINGDE 旌德县 (341825), NINGGUO 宁国市 (341881).

  5. d

    Data from: Community Database

    • catalog.data.gov
    • gimi9.com
    • +2more
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2024). Community Database [Dataset]. https://catalog.data.gov/dataset/community-database1
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    (Point of Contact, Custodian)
    Description

    This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such as the U.S. census. The fisheries databases used include port listings, weighout (dealer) landings, permit information on homeports and owner cities of residence, dealer permit information, and logbook records. The database consolidated port names in line with USGS and Census conventions, and corrected typographical errors, non-conventional spellings, or other issues. Each row is a community, and there may be confidential data since not all communities have 3 or more entities for the various variables.

  6. Data from: Electricity Load Forecasting

    • kaggle.com
    Updated Apr 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2021). Electricity Load Forecasting [Dataset]. https://www.kaggle.com/datasets/saurabhshahane/electricity-load-forecasting/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saurabh Shahane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    This is a useful dataset to train and test Machine Learning forecasting algorithms and compare results with the official forecast from weekly pre-dispatch reports. The following considerations should be kept to compare forecasting results with the weekly pre-dispatch forecast: 1. Saturday is the first day of each weekly forecast; for instance, Friday is the last day. 2. A 72 hours gap of unseen records should be considered before the first day to forecast. In other words, next week forecast should be done with records until each Tuesday last hour.

    Data sources provide hourly records. The data composition is the following: 1. Historical electricity load, available on daily post-dispatch reports, from the grid operator (CND). 2. Historical weekly forecasts available on weekly pre-dispatch reports, both from CND. 3. Calendar information related to school periods, from Panama's Ministery of Education. 4. Calendar information related to holidays, from "When on Earth?" website. 5. Weather variables, such as temperature, relative humidity, precipitation, and wind speed, for three main cities in Panama, from Earthdata.

    Content

    The original data sources provide the post-dispatch electricity load in individual Excel files on a daily basis and weekly pre-dispatch electricity load forecast data in individual Excel files on a weekly basis, both with hourly granularity. Holidays and school periods data is sparse, along with websites and PDF files. Weather data is available on daily NetCDF files.

    For simplicity, the published datasets are already pre-processed by merging all data sources on the date-time index: 1. A CSV file containing all records in a single continuous dataset with all variables. 2. A CSV file containing the load forecast from weekly pre-dispatch reports. 3. Two Excel files containing suggested regressors and 14 training/testing datasets pairs as described in the PDF file.

    Acknowledgements

    Aguilar Madrid, Ernesto (2021), “Short-term electricity load forecasting (Panama case study)”, Mendeley Data, V1, doi: 10.17632/byx7sztj59.1

  7. U

    Table 2-1: Daily water-level data recorded at monitoring sites in or near...

    • data.usgs.gov
    • catalog.data.gov
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Prinos; Joann Dixon, Table 2-1: Daily water-level data recorded at monitoring sites in or near Miami-Dade County, Florida, during the 1974-2009 water years [Dataset]. http://doi.org/10.5066/F78S4N0D
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Scott Prinos; Joann Dixon
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Oct 1, 1973 - Sep 30, 2009
    Area covered
    Florida, Miami-Dade County
    Description

    Excel table providing the daily water level data from the National Park Service, Everglades National Park, the South Florida Water Management District, and the U.S. Geological Survey during 1974-2009 prior to editing. [All data are in the original vertical datum provided by the collecting organizations (see row titled "Datum"). Data were retrieved from the databases of the National Park Service - Everglades National Park, South Florida Water Management District, and U.S. Geological Survey in July and August 2012. The row titled "Notes" describes the subsequent changes that were made during editing in 2012, such as the elimination of redundant or unusable site files and merging of datasets collected at the same site. Notes are made concerning sites with anomalous data points that were eliminated during editing. These points are highlighted in red in the table. For computational purposes this table does not include data qualifiers such as those which indicate which data are estim ...

  8. C

    Image stitching data set

    • ckan.mobidatalab.eu
    txt, xlsx, zip
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bundesanstalt für Wasserbau (2023). Image stitching data set [Dataset]. https://ckan.mobidatalab.eu/dataset/imagestitchingrecord
    Explore at:
    zip, txt, xlsxAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Bundesanstalt für Wasserbau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data set for the essay "Automatic merging of separated construction plans of hydraulic structures" submitted for Bautechnik 5/22. The data set is structured as follows: - The ZIP file "01 Original Data" contains 233 folders (named after the TU IDs) with the associated partial recordings in TIF format. The TIFs are binary compressed in CCITT Fax 4 format. 219 TUs are divided into two parts and 14 into three parts. The original data therefore consists of 480 partial recordings. - The ZIP file "02 Interim Results" contains 233 folders (named after the TU IDs) with relevant intermediate results generated during stitching. This includes the input images scaled to 10 MP, the visualization of the feature assignment(s) and the result in downscaled resolution with visualized seam lines. - The ZIP file "03_Results" contains the 170 successfully merged plans in high resolution in TIF format - The Excel file "Dataset" contains metadata on the 233 examined TUs including the DOT graph of the assignment described in the work and the correctness rating the results and the assignment to the presented sources of error. The data set was generated with the following metadata query in the IT system Digital Management of Technical Documents (DVtU): Microfilm metadata - TA (partial recording) - Number: "> 1" Document metadata - Object part: "130 (Wehrwangen, Wehrpillars)" - Object ID no .: "213 (Weir systems)" - Detail: "*[Bb]wehrung*" - Version: "01.00.00"

  9. Data for Fig 3 and 4.xlsx -- Article: Combining intransitive and higher...

    • figshare.com
    xlsx
    Updated May 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Vandermeer (2023). Data for Fig 3 and 4.xlsx -- Article: Combining intransitive and higher order effects in a coupled oscillator framework: a case study of an ant community by John Vandermeer and Ivertte Perfecto [Dataset]. http://doi.org/10.6084/m9.figshare.23244365.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 27, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    John Vandermeer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Three excel sheets with location data (x coord and y coord) for coffee trees in the survey plots presented in figures 3 and 4 of the article.

  10. g

    IP Australia - [Superseded] Intellectual Property Government Open Data 2019...

    • gimi9.com
    Updated Jul 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). IP Australia - [Superseded] Intellectual Property Government Open Data 2019 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_intellectual-property-government-open-data-2019
    Explore at:
    Dataset updated
    Jul 21, 2018
    Area covered
    Australia
    Description

    What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. # How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. # IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform # References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. * Patents * Trade Marks * Designs * Plant Breeder’s Rights # Updates ### Tables and columns Due to the changes in our systems, some tables have been affected. * We have added IPGOD 225 and IPGOD 325 to the dataset! * The IPGOD 206 table is not available this year. * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. ### Data quality improvements Data quality has been improved across all tables. * Null values are simply empty rather than '31/12/9999'. * All date columns are now in ISO format 'yyyy-mm-dd'. * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. * All tables are encoded in UTF-8. * All tables use the backslash \ as the escape character. * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  11. f

    Excel spreadsheet containing, in separate sheets for each figure, the...

    • plos.figshare.com
    xlsx
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassine Cherrak; Miguel Angel Salazar; Koray Yilmaz; Markus Kreuzer; Wolf-Dietrich Hardt (2024). Excel spreadsheet containing, in separate sheets for each figure, the underlying and individual numerical data used for Figs 1B–1D, 2B, 2C, 3A–3D, 4A–4G, S1A, S1B, S1C, S1D, S1E, S1F, [Dataset]. http://doi.org/10.1371/journal.pbio.3002616.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    PLOS Biology
    Authors
    Yassine Cherrak; Miguel Angel Salazar; Koray Yilmaz; Markus Kreuzer; Wolf-Dietrich Hardt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel spreadsheet containing, in separate sheets for each figure, the underlying and individual numerical data used for Figs 1B–1D, 2B, 2C, 3A–3D, 4A–4G, S1A, S1B, S1C, S1D, S1E, S1F,

  12. f

    Excel file of Excluded articles.

    • plos.figshare.com
    xlsx
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma Begley; Jason Thomas; Carl Senior (2025). Excel file of Excluded articles. [Dataset]. http://doi.org/10.1371/journal.pone.0322324.s006
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Emma Begley; Jason Thomas; Carl Senior
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe incidence and prevalence of neurodegenerative diseases (NDs) are growing worldwide. In an environment where healthcare resources are already stretched, it is important to optimise treatment choice to help alleviate healthcare burden. This rapid review aims to consolidate evidence on factors that influence healthcare professionals (HCPs) to prescribe medication for NDs and map them to theoretical models of behaviour change to identify the behavioural determinants that may support in optimising prescribing.Methods and findingsEmbase and Ovid MEDLINE were used to identify relevant empirical research studies. Screening, data extraction and quality assessment were carried out by three independent reviewers to ensure consistency. Factors influencing prescribing were mapped to the Theoretical Domains Framework (TDF) and key behavioural determinants were described using the Capability, Opportunity, Motivation – Behaviour (COM-B) model. An initial 3,099 articles were identified, of which 53 were included for data extraction. Fifty-six factors influencing prescribing were identified and categorised into patient, HCP or healthcare system groups, then mapped to TDF and COM-B domains. Prescribing was influenced by capability of HCPs, namely factors mapped to decision making (e.g., patient age or symptom burden) and knowledge (e.g., clinical understanding) behavioural domains. However, most factors were influenced by HCP opportunity, underpinned by factors mapped to social (e.g., prescribing support or culture) and contextual (e.g., lack of resources or medication availability) domains. Less evidence was available on factors influencing the motivation of HCPs, where evident; factors primarily related to HCP belief about consequences (e.g., side effects) and professional identify (e.g., level of specialism) were often described.ConclusionsThis systematic analysis of the literature provides an in-depth understanding of the behavioural determinants that may support in optimising prescribing practices (e.g., drug costs or pressure from patients’ family members). Understanding these approaches provides an opportunity to identify relevant intervention functions and behaviour change techniques to target the factors that directly influence HCP prescribing behaviour.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wieczorek, Oliver; Malzahn, Melanie (2023). Replication Data for Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus [Dataset]. http://doi.org/10.7910/DVN/UF8DHK

Replication Data for Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus

Explore at:
Dataset updated
Dec 16, 2023
Dataset provided by
Harvard Dataverse
Authors
Wieczorek, Oliver; Malzahn, Melanie
Description

The files and workflow will allow you to replicate the study titled "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus". This study aimed at utilizing the CEToM-corpus (https://cetom.univie.ac.at/) (Tocharian) to analyze the life-world of the elites of an extinct society situated in modern eastern China. To acquire the raw data needed for steps 1 & 2, please contact Melanie Malzahn melanie.malzahn@univie.ac.at. We conducted a mixed methods study, containing of close reading, content analysis, and multiple correspondence analysis (MCA). The excel file titled "fragments_architecture_combined.xlsx" allows for replication of the MCA and equates to the third step of the workflow outlined below. We used the following programming languages and packages to prepare the dataset and to analyze the data. Data preparation and merging procedures were achieved in python (version 3.9.10) with packages pandas (version 1.5.3), os (version 3.12.0), re (version 3.12.0), numpy (version 1.24.3), gensim (version 4.3.1), BeautifulSoup4 (version 4.12.2), pyasn1 (version 0.4.8), and langdetect (version 1.0.9). Multiple Correspondence Analyses were conducted in R (version 4.3.2) with the packages FactoMineR (version 2.9), factoextra (version 1.0.7), readxl version(1.4.3), tidyverse version(2.0.0), ggplot2 (version 3.4.4) and psych (version 2.3.9). After requesting the necessary files, please open the scripts in the order outlined bellow and execute the code-files to replicate the analysis: Preparatory step: Create a folder for the python and r-scripts downloadable in this repository. Open the file 0_create folders.py and declare a root folder in line 19. This first script will generate you the following folders: "tarim-brahmi_database" = Folder, which contains tocharian dictionaries and tocharian text fragments. "dictionaries" = contains tocharian A and tocharian B vocabularies, including linguistic features such as translations, meanings, part of speech tags etc. A full overview of the words is provided on https://cetom.univie.ac.at/?words. "fragments" = contains tocharian text fragments as xml-files. "word_corpus_data" = folder will contain excel-files of the corpus data after the first step. "Architectural_terms" = This folder contains the data on the architectural terms used in the dataset (e.g. dwelling, house). "regional_data" = This folder contains the data on the findsports (tocharian and modern chinese equivalent, e.g. Duldur-Akhur & Kucha). "mca_ready_data" = This is the folder, in which the excel-file with the merged data will be saved. Note that the prepared file named "fragments_architecture_combined.xlsx" can be saved into this directory. This allows you to skip steps 1 &2 and reproduce the MCA of the content analysis based on the third step of our workflow (R-Script titled 3_conduct_MCA.R). First step - run 1_read_xml-files.py: loops over the xml-files in folder dictionaries and identifies a) word metadata, including language (Tocharian A or B), keywords, part of speech, lemmata, word etymology, and loan sources. Then, it loops over the xml-textfiles and extracts a text id number, langauge (Tocharian A or B), text title, text genre, text subgenre, prose type, verse type, material on which the text is written, medium, findspot, the source text in tocharian, and the translation where available. After successful feature extraction, the resulting pandas dataframe object is exported to the word_corpus_data folder. Second step - run 2_merge_excel_files.py: merges all excel files (corpus, data on findspots, word data) and reproduces the content analysis, which was based upon close reading in the first place. Third step - run 3_conduct_MCA.R: recodes, prepares, and selects the variables necessary to conduct the MCA. Then produces the descriptive values, before conducitng the MCA, identifying typical texts per dimension, and exporting the png-files uploaded to this repository.

Search
Clear search
Close search
Google apps
Main menu