The files and workflow will allow you to replicate the study titled "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus". This study aimed at utilizing the CEToM-corpus (https://cetom.univie.ac.at/) (Tocharian) to analyze the life-world of the elites of an extinct society situated in modern eastern China. To acquire the raw data needed for steps 1 & 2, please contact Melanie Malzahn melanie.malzahn@univie.ac.at. We conducted a mixed methods study, containing of close reading, content analysis, and multiple correspondence analysis (MCA). The excel file titled "fragments_architecture_combined.xlsx" allows for replication of the MCA and equates to the third step of the workflow outlined below. We used the following programming languages and packages to prepare the dataset and to analyze the data. Data preparation and merging procedures were achieved in python (version 3.9.10) with packages pandas (version 1.5.3), os (version 3.12.0), re (version 3.12.0), numpy (version 1.24.3), gensim (version 4.3.1), BeautifulSoup4 (version 4.12.2), pyasn1 (version 0.4.8), and langdetect (version 1.0.9). Multiple Correspondence Analyses were conducted in R (version 4.3.2) with the packages FactoMineR (version 2.9), factoextra (version 1.0.7), readxl version(1.4.3), tidyverse version(2.0.0), ggplot2 (version 3.4.4) and psych (version 2.3.9). After requesting the necessary files, please open the scripts in the order outlined bellow and execute the code-files to replicate the analysis: Preparatory step: Create a folder for the python and r-scripts downloadable in this repository. Open the file 0_create folders.py and declare a root folder in line 19. This first script will generate you the following folders: "tarim-brahmi_database" = Folder, which contains tocharian dictionaries and tocharian text fragments. "dictionaries" = contains tocharian A and tocharian B vocabularies, including linguistic features such as translations, meanings, part of speech tags etc. A full overview of the words is provided on https://cetom.univie.ac.at/?words. "fragments" = contains tocharian text fragments as xml-files. "word_corpus_data" = folder will contain excel-files of the corpus data after the first step. "Architectural_terms" = This folder contains the data on the architectural terms used in the dataset (e.g. dwelling, house). "regional_data" = This folder contains the data on the findsports (tocharian and modern chinese equivalent, e.g. Duldur-Akhur & Kucha). "mca_ready_data" = This is the folder, in which the excel-file with the merged data will be saved. Note that the prepared file named "fragments_architecture_combined.xlsx" can be saved into this directory. This allows you to skip steps 1 &2 and reproduce the MCA of the content analysis based on the third step of our workflow (R-Script titled 3_conduct_MCA.R). First step - run 1_read_xml-files.py: loops over the xml-files in folder dictionaries and identifies a) word metadata, including language (Tocharian A or B), keywords, part of speech, lemmata, word etymology, and loan sources. Then, it loops over the xml-textfiles and extracts a text id number, langauge (Tocharian A or B), text title, text genre, text subgenre, prose type, verse type, material on which the text is written, medium, findspot, the source text in tocharian, and the translation where available. After successful feature extraction, the resulting pandas dataframe object is exported to the word_corpus_data folder. Second step - run 2_merge_excel_files.py: merges all excel files (corpus, data on findspots, word data) and reproduces the content analysis, which was based upon close reading in the first place. Third step - run 3_conduct_MCA.R: recodes, prepares, and selects the variables necessary to conduct the MCA. Then produces the descriptive values, before conducitng the MCA, identifying typical texts per dimension, and exporting the png-files uploaded to this repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has …Show full descriptionWhat is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. Patents Trade Marks Designs Plant Breeder’s Rights Updates Tables and columns Due to the changes in our systems, some tables have been affected. We have added IPGOD 225 and IPGOD 325 to the dataset! The IPGOD 206 table is not available this year. Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. Data quality improvements Data quality has been improved across all tables. Null values are simply empty rather than '31/12/9999'. All date columns are now in ISO format 'yyyy-mm-dd'. All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. All tables are encoded in UTF-8. All tables use the backslash \ as the escape character. The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises a comprehensive set of files designed for the analysis and 2D correlation of spectral data, specifically focusing on ATR and NIR spectra. It includes MATLAB scripts and supporting functions necessary to replicate the analysis, as well as the raw datasets used in the study. Below is a detailed description of the included files:
Data Analysis:
Data_Analysis.mlx
2D Correlation Data Analysis:
Data_Analysis_2Dcorr.mlx
Functions:
Functions
Datasets:
ATR_dataset.xlsx
, NIR_dataset.xlsx
, Reference_data.csv
Data_Analysis.mlx
and Data_Analysis_2Dcorr.mlx
scripts in MATLAB, ensuring that the Functions
folder is in the MATLAB path.CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data files in this study have been created by merging individual per-county census files available as Microsoft Excel spreadsheets. Thus the study contains 111 data files, 1 file per census category. Each of these files contains merged census data for the following counties: DONGSHI 东市区 (340102), ZHONGSHI 中市区 (340103), XISHI 西市区 (340104), HEFEI JIAOQU 合肥郊区 (340111), CHANGFENG 长丰县 (340121), FEIDONG 肥东县 (340122), FEIXI 肥西县 (340123), JINGHU 镜湖区 (340202), MATANG 马塘区 (340203), XINWU 新芜区 (340204), JIUJIANG 鸠江区 (340207), WUHU 芜湖县 (340221), FANCHANG 繁昌县 (340222), NANLING 南陵县 (340223), DONGSHI 东市区 (340302), ZHONGSHI 中市区 (340303), XISHI 西市区 (340304), BANGBU JIAOQU 蚌埠市郊区 (340311), HUAIYUAN 怀远县 (340321), WUHE 五河县 (340322), GUZHEN 固镇县 (340323), DATONG 大通区 (340402), TIANJIAAN 田家庵区 (340403), XIEJIAJI 谢家集区 (340404), BAGONGSHAN 八公山区 (340405), PANJI 潘集区 (340406), FENGTAI 凤台县 (340421), JINJIAZHUANG 金家庄区 (340502), HUASHAN 花山区 (340503), YUSHAN 雨山区 (340504), XIANGSHAN 向山区 (340505), DANGTU 当涂县 (340521), DUJI 杜集区 (340602), XIANGSHAN 相山区 (340603), LIESHAN 烈山区 (340604), SUIXI 濉溪县 (340621), TONGGUANSHAN 铜官山区 (340702), SHIZISHAN 狮子山区 (340703), TONGLING JIAOQU 铜陵市郊区 (340711), TONGLING 铜陵县 (340721), YINGJIANG 迎江区 (340802), DAGUAN 大观区 (340803), ANQING JIAOQU 安庆市郊区 (340811), HUAINING 怀宁县 (340822), ZONGYANG 枞阳县 (340823), QIANSHAN 潜山县 (340824), TAIHU 太湖县 (340825), SUSONG 宿松县 (340826), WANGJIANG 望江县 (340827), YUEXI 岳西县 (340828), TONGCHENG 桐城市 (340881), TUNXI 屯溪区 (341002), HUANGSHAN 黄山区 (341003), HUIZHOU 徽州区 (341004), SHEXIAN 歙县 (341021), XIUNING 休宁县 (341022), YIXIAN 黟县 (341023), QIMEN 祁门县 (341024), LANGYA 琅琊区 (341102), NANQIAO 南谯区 (341103), LAIAN 来安县 (341122), QUANJIAO 全椒县 (341124), DINGYUAN 定远县 (341125), FENGYANG 凤阳县 (341126), TIANCHANG 天长市 (341181), MINGGUANG 明光市 (341182), YINGZHOU 颍州区 (341202), YINGDONG 颍东区 (341203), YINGQUAN 颍泉区 (341204), LINQUAN 临泉县 (341221), TAIHE 太和县 (341222), FUNAN 阜南县 (341225), YINGSHANG 颍上县 (341226), JIESHOU 界首市 (341282), YONGQIAO 埇桥区 (341302), DANGSHAN 砀山县 (341321), XIAOXIAN 萧县 (341322), LINGBI 灵璧县 (341323), SIXIAN 泗县 (341324), JUCHAO 居巢区 (341402), LUJIANG 庐江县 (341421), WUWEI 无为县 (341422), HANSHAN 含山县 (341423), HEXIAN 和县 (341424), JINAN 金安区 (341502), YUAN 裕安区 (341503), SHOUXIAN 寿县 (341521), HUOQIU 霍邱县 (341522), SHUCHENG 舒城县 (341523), JINZHAI 金寨县 (341524), HUOSHAN 霍山县 (341525), QIAOCHENG 谯城区 (341602), GUOYANG 涡阳县 (341621), MENGCHENG 蒙城县 (341622), LIXIN 利辛县 (341623), GUICHI 贵池区 (341702), DONGZHI 东至县 (341721), SHITAI 石台县 (341722), QINGYANG 青阳县 (341723), XUANZHOU 宣州区 (341802), LANGXI 郎溪县 (341821), GUANGDE 广德县 (341822), JINGXIAN 泾县 (341823), JIXI 绩溪县 (341824), JINGDE 旌德县 (341825), NINGGUO 宁国市 (341881).
This excel spreadsheet is the result of merging at the port level of several of the in-house fisheries databases in combination with other demographic databases such as the U.S. census. The fisheries databases used include port listings, weighout (dealer) landings, permit information on homeports and owner cities of residence, dealer permit information, and logbook records. The database consolidated port names in line with USGS and Census conventions, and corrected typographical errors, non-conventional spellings, or other issues. Each row is a community, and there may be confidential data since not all communities have 3 or more entities for the various variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a useful dataset to train and test Machine Learning forecasting algorithms and compare results with the official forecast from weekly pre-dispatch reports. The following considerations should be kept to compare forecasting results with the weekly pre-dispatch forecast: 1. Saturday is the first day of each weekly forecast; for instance, Friday is the last day. 2. A 72 hours gap of unseen records should be considered before the first day to forecast. In other words, next week forecast should be done with records until each Tuesday last hour.
Data sources provide hourly records. The data composition is the following: 1. Historical electricity load, available on daily post-dispatch reports, from the grid operator (CND). 2. Historical weekly forecasts available on weekly pre-dispatch reports, both from CND. 3. Calendar information related to school periods, from Panama's Ministery of Education. 4. Calendar information related to holidays, from "When on Earth?" website. 5. Weather variables, such as temperature, relative humidity, precipitation, and wind speed, for three main cities in Panama, from Earthdata.
The original data sources provide the post-dispatch electricity load in individual Excel files on a daily basis and weekly pre-dispatch electricity load forecast data in individual Excel files on a weekly basis, both with hourly granularity. Holidays and school periods data is sparse, along with websites and PDF files. Weather data is available on daily NetCDF files.
For simplicity, the published datasets are already pre-processed by merging all data sources on the date-time index: 1. A CSV file containing all records in a single continuous dataset with all variables. 2. A CSV file containing the load forecast from weekly pre-dispatch reports. 3. Two Excel files containing suggested regressors and 14 training/testing datasets pairs as described in the PDF file.
Aguilar Madrid, Ernesto (2021), “Short-term electricity load forecasting (Panama case study)”, Mendeley Data, V1, doi: 10.17632/byx7sztj59.1
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Excel table providing the daily water level data from the National Park Service, Everglades National Park, the South Florida Water Management District, and the U.S. Geological Survey during 1974-2009 prior to editing. [All data are in the original vertical datum provided by the collecting organizations (see row titled "Datum"). Data were retrieved from the databases of the National Park Service - Everglades National Park, South Florida Water Management District, and U.S. Geological Survey in July and August 2012. The row titled "Notes" describes the subsequent changes that were made during editing in 2012, such as the elimination of redundant or unusable site files and merging of datasets collected at the same site. Notes are made concerning sites with anomalous data points that were eliminated during editing. These points are highlighted in red in the table. For computational purposes this table does not include data qualifiers such as those which indicate which data are estim ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data set for the essay "Automatic merging of separated construction plans of hydraulic structures" submitted for Bautechnik 5/22. The data set is structured as follows: - The ZIP file "01 Original Data" contains 233 folders (named after the TU IDs) with the associated partial recordings in TIF format. The TIFs are binary compressed in CCITT Fax 4 format. 219 TUs are divided into two parts and 14 into three parts. The original data therefore consists of 480 partial recordings. - The ZIP file "02 Interim Results" contains 233 folders (named after the TU IDs) with relevant intermediate results generated during stitching. This includes the input images scaled to 10 MP, the visualization of the feature assignment(s) and the result in downscaled resolution with visualized seam lines. - The ZIP file "03_Results" contains the 170 successfully merged plans in high resolution in TIF format - The Excel file "Dataset" contains metadata on the 233 examined TUs including the DOT graph of the assignment described in the work and the correctness rating the results and the assignment to the presented sources of error. The data set was generated with the following metadata query in the IT system Digital Management of Technical Documents (DVtU): Microfilm metadata - TA (partial recording) - Number: "> 1" Document metadata - Object part: "130 (Wehrwangen, Wehrpillars)" - Object ID no .: "213 (Weir systems)" - Detail: "*[Bb]wehrung*" - Version: "01.00.00"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Three excel sheets with location data (x coord and y coord) for coffee trees in the survey plots presented in figures 3 and 4 of the article.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel spreadsheet containing, in separate sheets for each figure, the underlying and individual numerical data used for Figs 1B–1D, 2B, 2C, 3A–3D, 4A–4G, S1A, S1B, S1C, S1D, S1E, S1F,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe incidence and prevalence of neurodegenerative diseases (NDs) are growing worldwide. In an environment where healthcare resources are already stretched, it is important to optimise treatment choice to help alleviate healthcare burden. This rapid review aims to consolidate evidence on factors that influence healthcare professionals (HCPs) to prescribe medication for NDs and map them to theoretical models of behaviour change to identify the behavioural determinants that may support in optimising prescribing.Methods and findingsEmbase and Ovid MEDLINE were used to identify relevant empirical research studies. Screening, data extraction and quality assessment were carried out by three independent reviewers to ensure consistency. Factors influencing prescribing were mapped to the Theoretical Domains Framework (TDF) and key behavioural determinants were described using the Capability, Opportunity, Motivation – Behaviour (COM-B) model. An initial 3,099 articles were identified, of which 53 were included for data extraction. Fifty-six factors influencing prescribing were identified and categorised into patient, HCP or healthcare system groups, then mapped to TDF and COM-B domains. Prescribing was influenced by capability of HCPs, namely factors mapped to decision making (e.g., patient age or symptom burden) and knowledge (e.g., clinical understanding) behavioural domains. However, most factors were influenced by HCP opportunity, underpinned by factors mapped to social (e.g., prescribing support or culture) and contextual (e.g., lack of resources or medication availability) domains. Less evidence was available on factors influencing the motivation of HCPs, where evident; factors primarily related to HCP belief about consequences (e.g., side effects) and professional identify (e.g., level of specialism) were often described.ConclusionsThis systematic analysis of the literature provides an in-depth understanding of the behavioural determinants that may support in optimising prescribing practices (e.g., drug costs or pressure from patients’ family members). Understanding these approaches provides an opportunity to identify relevant intervention functions and behaviour change techniques to target the factors that directly influence HCP prescribing behaviour.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The files and workflow will allow you to replicate the study titled "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus". This study aimed at utilizing the CEToM-corpus (https://cetom.univie.ac.at/) (Tocharian) to analyze the life-world of the elites of an extinct society situated in modern eastern China. To acquire the raw data needed for steps 1 & 2, please contact Melanie Malzahn melanie.malzahn@univie.ac.at. We conducted a mixed methods study, containing of close reading, content analysis, and multiple correspondence analysis (MCA). The excel file titled "fragments_architecture_combined.xlsx" allows for replication of the MCA and equates to the third step of the workflow outlined below. We used the following programming languages and packages to prepare the dataset and to analyze the data. Data preparation and merging procedures were achieved in python (version 3.9.10) with packages pandas (version 1.5.3), os (version 3.12.0), re (version 3.12.0), numpy (version 1.24.3), gensim (version 4.3.1), BeautifulSoup4 (version 4.12.2), pyasn1 (version 0.4.8), and langdetect (version 1.0.9). Multiple Correspondence Analyses were conducted in R (version 4.3.2) with the packages FactoMineR (version 2.9), factoextra (version 1.0.7), readxl version(1.4.3), tidyverse version(2.0.0), ggplot2 (version 3.4.4) and psych (version 2.3.9). After requesting the necessary files, please open the scripts in the order outlined bellow and execute the code-files to replicate the analysis: Preparatory step: Create a folder for the python and r-scripts downloadable in this repository. Open the file 0_create folders.py and declare a root folder in line 19. This first script will generate you the following folders: "tarim-brahmi_database" = Folder, which contains tocharian dictionaries and tocharian text fragments. "dictionaries" = contains tocharian A and tocharian B vocabularies, including linguistic features such as translations, meanings, part of speech tags etc. A full overview of the words is provided on https://cetom.univie.ac.at/?words. "fragments" = contains tocharian text fragments as xml-files. "word_corpus_data" = folder will contain excel-files of the corpus data after the first step. "Architectural_terms" = This folder contains the data on the architectural terms used in the dataset (e.g. dwelling, house). "regional_data" = This folder contains the data on the findsports (tocharian and modern chinese equivalent, e.g. Duldur-Akhur & Kucha). "mca_ready_data" = This is the folder, in which the excel-file with the merged data will be saved. Note that the prepared file named "fragments_architecture_combined.xlsx" can be saved into this directory. This allows you to skip steps 1 &2 and reproduce the MCA of the content analysis based on the third step of our workflow (R-Script titled 3_conduct_MCA.R). First step - run 1_read_xml-files.py: loops over the xml-files in folder dictionaries and identifies a) word metadata, including language (Tocharian A or B), keywords, part of speech, lemmata, word etymology, and loan sources. Then, it loops over the xml-textfiles and extracts a text id number, langauge (Tocharian A or B), text title, text genre, text subgenre, prose type, verse type, material on which the text is written, medium, findspot, the source text in tocharian, and the translation where available. After successful feature extraction, the resulting pandas dataframe object is exported to the word_corpus_data folder. Second step - run 2_merge_excel_files.py: merges all excel files (corpus, data on findspots, word data) and reproduces the content analysis, which was based upon close reading in the first place. Third step - run 3_conduct_MCA.R: recodes, prepares, and selects the variables necessary to conduct the MCA. Then produces the descriptive values, before conducitng the MCA, identifying typical texts per dimension, and exporting the png-files uploaded to this repository.