8 datasets found
  1. p

    Business Activity Survey 2009 - Samoa

    • microdata.pacificdata.org
    Updated Jul 2, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samoa Bureau of Statistics (2019). Business Activity Survey 2009 - Samoa [Dataset]. https://microdata.pacificdata.org/index.php/catalog/253
    Explore at:
    Dataset updated
    Jul 2, 2019
    Dataset authored and provided by
    Samoa Bureau of Statistics
    Time period covered
    2009
    Area covered
    Samoa
    Description

    Abstract

    The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).

    The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.

    The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.

    Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).

    A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.

    Geographic coverage

    National Coverage

    Analysis unit

    The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.

    SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.

    It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.

    Universe

    The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    -Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)

    WILL CONFIRM LATER!!

    OSO LE MEA E LE FAASA...AEA :-)

    Mode of data collection

    Mail Questionnaire [mail]

    Research instrument

    1. General instructions, authority for the survey, etc;
    2. Business demography information on ownership, contact details, structure, etc.;
    3. Employment;
    4. Income;
    5. Expenses;
    6. Inventories;
    7. Profit or loss and reconciliation to business accounts' profit and loss;
    8. Fixed assets - purchases, disposals, book values
    9. Thank you and signature of respondent.

    Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.

    Cleaning operations

    Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.

    Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.

    Sampling error estimates

    NOT APPLICABLE!!

  2. Z

    Data from: Elemental vacancy diffusion database from high-throughput...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wu, Henry (2020). Elemental vacancy diffusion database from high-throughput first-principles calculations for fcc and hcp structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_17888
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Morgan, Dane
    Angsten, Thomas
    Wu, Henry
    Mayeshiba, Tam
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This work demonstrates how databases of diffusion-related properties can be developed from high-throughput ab initio calculations. The formation and migration energies for vacancies of all adequately stable pure elements in both the face-centered cubic (fcc) and hexagonal close packing (hcp) crystal structures were determined using ab initio calculations. For hcp migration, both the basal plane and z-direction nearest-neighbor vacancy hops were considered. Energy barriers were successfully calculated for 49 elements in the fcc structure and 44 elements in the hcp structure. These data were plotted against various elemental properties in order to discover significant correlations. The calculated data show smooth and continuous trends when plotted against Mendeleev numbers. The vacancy formation energies were plotted against cohesive energies to produce linear trends with regressed slopes of 0.317 and 0.323 for the fcc and hcp structures respectively. This result shows the expected increase in vacancy formation energy with stronger bonding. The slope of approximately 0.3, being well below that predicted by a simple fixed bond strength model, is consistent with a reduction in the vacancy formation energy due to many-body effects and relaxation. Vacancy migration barriers are found to increase nearly linearly with increasing stiffness, consistent with the local expansion required to migrate an atom. A simple semi-empirical expression is created to predict the vacancy migration energy from the lattice constant and bulk modulus for fcc systems, yielding estimates with errors of approximately 30%.

    Files:

    figure_excel_files.zip:

    Excel files for figures in the publication, and excel files of main data tables for FCC and HCP vacancy formation energies and vacancy migration energies.

    fcc_hvf_hvm.tar.gz and hcp_hvf_hvm.tar.gz:

    Raw VASP files corresponding to FCC and HCP vacancy formation energies and vacancy migration energies.

    bulk_modulus.tar.gz:

    Raw VASP files corresponding to FCC bulk modulus calculations.

  3. Z

    Data from: Lost in Translation: A Study of Bugs Introduced by Large Language...

    • data.niaid.nih.gov
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahimzada, Ali Reza (2024). Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8190051
    Explore at:
    Dataset updated
    Jan 25, 2024
    Dataset authored and provided by
    Ibrahimzada, Ali Reza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artifact repository for the paper Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code, accepted at ICSE 2024, Lisbon, Portugal. Authors are Rangeet Pan* Ali Reza Ibrahimzada*, Rahul Krishna, Divya Sankar, Lambert Pougeum Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand.

    Install

    This repository contains the source code for reproducing the results in our paper. Please start by cloning this repository:

    git clone https://github.com/Intelligent-CAT-Lab/PLTranslationEmpirical

    We recommend using a virtual environment for running the scripts. Please download conda 23.11.0 from this link. You can create a virtual environment using the following command:

    conda create -n plempirical python=3.10.13

    After creating the virtual environment, you can activate it using the following command:

    conda activate plempirical

    You can run the following command to make sure that you are using the correct version of Python:

    python3 --version && pip3 --version

    Dependencies

    To install all software dependencies, please execute the following command:

    pip3 install -r requirements.txt

    As for hardware dependencies, we used 16 NVIDIA A100 GPUs with 80GBs of memory for inferencing models. The models can be inferenced on any combination of GPUs as long as the reader can properly distribute the model weights across the GPUs. We did not perform weight distribution since we had enough memory (80 GB) per GPU.

    Moreover, for compiling and testing the generated translations, we used Python 3.10, g++ 11, GCC Clang 14.0, Java 11, Go 1.20, Rust 1.73, and .Net 7.0.14 for Python, C++, C, Java, Go, Rust, and C#, respectively. Overall, we recommend using a machine with Linux OS and at least 32GB of RAM for running the scripts.

    For running scripts of alternative approaches, you need to make sure you have installed C2Rust, CxGO, and Java2C# on your machine. Please refer to their repositories for installation instructions. For Java2C#, you need to create a .csproj file like below:

    Exe
    net7.0
    enable
    enable
    

    Dataset

    We uploaded the dataset we used in our empirical study to Zenodo. The dataset is organized as follows:

    CodeNet

    AVATAR

    Evalplus

    Apache Commons-CLI

    Click

    Please download and unzip the dataset.zip file from Zenodo. After unzipping, you should see the following directory structure:

    PLTranslationEmpirical ├── dataset ├── codenet ├── avatar ├── evalplus ├── real-life-cli ├── ...

    The structure of each dataset is as follows:

    1. CodeNet & Avatar: Each directory in these datasets correspond to a source language where each include two directories Code and TestCases for code snippets and test cases, respectively. Each code snippet has an id in the filename, where the id is used as a prefix for test I/O files.

    2. Evalplus: The source language code snippets follow a similar structure as CodeNet and Avatar. However, as a one time effort, we manually created the test cases in the target Java language inside a maven project, evalplus_java. To evaluate the translations from an LLM, we recommend moving the generated Java code snippets to the src/main/java directory of the maven project and then running the command mvn clean test surefire-report:report -Dmaven.test.failure.ignore=true to compile, test, and generate reports for the translations.

    3. Real-life Projects: The real-life-cli directory represents two real-life CLI projects from Java and Python. These datasets only contain code snippets as files and no test cases. As mentioned in the paper, the authors manually evaluated the translations for these datasets.

    Scripts

    We provide bash scripts for reproducing our results in this work. First, we discuss the translation script. For doing translation with a model and dataset, first you need to create a .env file in the repository and add the following:

    OPENAI_API_KEY= LLAMA2_AUTH_TOKEN= STARCODER_AUTH_TOKEN=

    1. Translation with GPT-4: You can run the following command to translate all Python -> Java code snippets in codenet dataset with the GPT-4 while top-k sampling is k=50, top-p sampling is p=0.95, and temperature=0.7:

    bash scripts/translate.sh GPT-4 codenet Python Java 50 0.95 0.7 0

    1. Translation with CodeGeeX: Prior to running the script, you need to clone the CodeGeeX repository from here and use the instructions from their artifacts to download their model weights. After cloning it inside PLTranslationEmpirical and downloading the model weights, your directory structure should be like the following:

    PLTranslationEmpirical ├── dataset ├── codenet ├── avatar ├── evalplus ├── real-life-cli ├── CodeGeeX ├── codegeex ├── codegeex_13b.pt # this file is the model weight ├── ... ├── ...

    You can run the following command to translate all Python -> Java code snippets in codenet dataset with the CodeGeeX while top-k sampling is k=50, top-p sampling is p=0.95, and temperature=0.2 on GPU gpu_id=0:

    bash scripts/translate.sh CodeGeeX codenet Python Java 50 0.95 0.2 0

    1. For all other models (StarCoder, CodeGen, LLaMa, TB-Airoboros, TB-Vicuna), you can execute the following command to translate all Python -> Java code snippets in codenet dataset with the StarCoder|CodeGen|LLaMa|TB-Airoboros|TB-Vicuna while top-k sampling is k=50, top-p sampling is p=0.95, and temperature=0.2 on GPU gpu_id=0:

    bash scripts/translate.sh StarCoder codenet Python Java 50 0.95 0.2 0

    1. For translating and testing pairs with traditional techniques (i.e., C2Rust, CxGO, Java2C#), you can run the following commands:

    bash scripts/translate_transpiler.sh codenet C Rust c2rust fix_report bash scripts/translate_transpiler.sh codenet C Go cxgo fix_reports bash scripts/translate_transpiler.sh codenet Java C# java2c# fix_reports bash scripts/translate_transpiler.sh avatar Java C# java2c# fix_reports

    1. For compile and testing of CodeNet, AVATAR, and Evalplus (Python to Java) translations from GPT-4, and generating fix reports, you can run the following commands:

    bash scripts/test_avatar.sh Python Java GPT-4 fix_reports 1 bash scripts/test_codenet.sh Python Java GPT-4 fix_reports 1 bash scripts/test_evalplus.sh Python Java GPT-4 fix_reports 1

    1. For repairing unsuccessful translations of Java -> Python in CodeNet dataset with GPT-4, you can run the following commands:

    bash scripts/repair.sh GPT-4 codenet Python Java 50 0.95 0.7 0 1 compile bash scripts/repair.sh GPT-4 codenet Python Java 50 0.95 0.7 0 1 runtime bash scripts/repair.sh GPT-4 codenet Python Java 50 0.95 0.7 0 1 incorrect

    1. For cleaning translations of open-source LLMs (i.e., StarCoder) in codenet, you can run the following command:

    bash scripts/clean_generations.sh StarCoder codenet

    Please note that for the above commands, you can change the dataset and model name to execute the same thing for other datasets and models. Moreover, you can refer to /prompts for different vanilla and repair prompts used in our study.

    Artifacts

    Please download the artifacts.zip file from our Zenodo repository. We have organized the artifacts as follows:

    RQ1 - Translations: This directory contains the translations from all LLMs and for all datasets. We have added an excel file to show a detailed breakdown of the translation results.

    RQ2 - Manual Labeling: This directory contains an excel file which includes the manual labeling results for all translation bugs.

    RQ3 - Alternative Approaches: This directory contains the translations from all alternative approaches (i.e., C2Rust, CxGO, Java2C#). We have added an excel file to show a detailed breakdown of the translation results.

    RQ4 - Mitigating Translation Bugs: This directory contains the fix results of GPT-4, StarCoder, CodeGen, and Llama 2. We have added an excel file to show a detailed breakdown of the fix results.

    Contact

    We look forward to hearing your feedback. Please contact Rangeet Pan or Ali Reza Ibrahimzada for any questions or comments 🙏.

  4. Z

    Measuring Bulk Crystallographic Texture from Ti-6Al-4V Hot-Rolled Sample...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel, Christopher Stuart (2023). Measuring Bulk Crystallographic Texture from Ti-6Al-4V Hot-Rolled Sample Matrices using Synchrotron X-ray Diffraction (Analysis Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7437908
    Explore at:
    Dataset updated
    Feb 3, 2023
    Dataset provided by
    Quinta da Fonseca, João
    Zeng, Xiaohan
    Daniel, Christopher Stuart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset of synchrotron X-ray diffraction (SXRD) analysis files, recording the refinement of crystallographic texture from a number of Ti-6Al-4V (Ti-64) sample matrices, containing a total of 93 hot-rolled samples, from three different orthogonal sample directions. The aim of the work was to accurately quantify bulk macro-texture for both the α (hexagonal close packed, hcp) and β (body-centred cubic, bcc) phases across a range of different processing conditions.

    Material

    Prior to the experiment, the Ti-64 materials had been hot-rolled at a range of different temperatures, and to different reductions, followed by air-cooling, using a rolling mill at The University of Manchester. Rectangular specimens (6 mm x 5 mm x 2 mm) were then machined from the centre of these rolled blocks, and from the starting material. The samples were cut along different orthogonal rolling directions and are referenced according to alignment of the rolling directions (RD – rolling direction, TD – transverse direction, ND – normal direction) with the long horizontal (X) axis and short vertical (Y) axis of the rectangular specimens. Samples of the same orientation were glued together to form matrices for the synchrotron analysis. The material, rolling conditions, sample orientations and experiment reference numbers used for the synchrotron diffraction analysis are included in the data as an excel spreadsheet.

    SXRD Data Collection

    Data was recorded using a high energy 90 keV synchrotron X-ray beam and a 5 second exposure at the detector for each measurement point. The slits were adjusted to give a 0.5 x 0.5 mm beam area, chosen to optimally resolve both the α and β phase peaks. The SXRD data was recorded by stage-scanning the beam in sequential X-Y positions at 0.5 mm increments across the rectangular sample matrices, containing a number of samples glued together, to analyse a total of 93 samples from the different processing conditions and orientations. Post-processing of the data was then used to sort the data into a rectangular grid of measurement points from each individual sample.

    Diffraction Pattern Averaging

    The stage-scan diffraction pattern images from each matrix were sorted into individual samples, and the images averaged together for each specimen, using a Python notebook sxrd-tiff-summer. The averaged .tiff images each capture average diffraction peak intensities from an area of about 30 mm2 (equivalent to a total volume of ~ 60 mm3), with three different sample orientations then used to calculate the bulk crystallographic texture from each rolling condition.

    SXRD Data Analysis

    A new Fourier-based peak fitting method from the Continuous-Peak-Fit Python package was used to fit full diffraction pattern ring intensities, using a range of different lattice plane peaks for determining crystallographic texture in both the α and β phases. Bulk texture was calculated by combining the ring intensities from three different sample orientations.

    A .poni calibration file was created using Dioptas, through a refinement matching peak intensities from a LaB6 or CeO2 standard diffraction pattern image. Two calibrations were needed as some of the data was collected in July 2022 and some of the data was collected in August 2022. Dioptas was then used to determine peak bounds in 2θ for characterising a total of 22 α and 4 β lattice plane rings from the averaged Ti-64 diffraction pattern images, which were recorded in a .py input script. Using these two inputs, Continuous-Peak-Fit automatically converts full diffraction pattern rings into profiles of intensity versus azimuthal angle, for each 2θ section, which can also include multiple overlapping α and β peaks.

    The Continuous-Peak-Fit refinement can be launched in a notebook or from the terminal, to automatically calculate a full mathematical description, in the form of Fourier expansion terms, to match the intensity variation of each individual lattice plane ring. The results for peak position, intensity and half-width for all 22 α and 4 β lattice plane peaks were recorded at an azimuthal resolution of 1º and stored in a .fit output file. Details for setting up and running this analysis can be found in the continuous-peak-fit-analysis package. This package also includes a Python script for extracting lattice plane ring intensity distributions from the .fit files, matching the intensity values with spherical polar coordinates to parametrise the intensity distributions from each of the three different sample orientations, in the form of pole figures. The script can also be used to combine intensity distributions from different sample orientations. The final intensity variations are recorded for each of the lattice plane peaks as text files, which can be loaded into MTEX to plot and analyse both the α and β phase crystallographic texture.

    Metadata

    An accompanying YAML text file contains associated SXRD beamline metadata for each measurement. The raw data is in the form of synchrotron diffraction pattern .tiff images which were too large to upload to Zenodo and are instead stored on The University of Manchester's Research Database Storage (RDS) repository. The raw data can therefore be obtained by emailing the authors.

    The material data folder documents the machining of the samples and the sample orientations.

    The associated processing metadata for the Continuous-Peak-Fit analyses records information about the different packages used to process the data, along with details about the different files contained within this analysis dataset.

  5. d

    GP Practice Prescribing Presentation-level Data - February 2015

    • digital.nhs.uk
    csv, zip
    Updated May 29, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). GP Practice Prescribing Presentation-level Data - February 2015 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/practice-level-prescribing-data
    Explore at:
    csv(278.4 kB), csv(1.4 GB), zip(246.4 MB), csv(1.7 MB)Available download formats
    Dataset updated
    May 29, 2015
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Feb 1, 2015 - Feb 28, 2015
    Area covered
    United Kingdom
    Description

    Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available using the link in the 'Related Links' section below. Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation. Please note that on 1 June 2015, the broken link for the Self-extracting files was fixed.

  6. f

    Data supporting the study.

    • figshare.com
    xlsx
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liknaw Bewket Zeleke; Alec Welsh; Gedefaw Abeje; Marjan Khejahei (2024). Data supporting the study. [Dataset]. http://doi.org/10.1371/journal.pone.0303020.s005
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 9, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Liknaw Bewket Zeleke; Alec Welsh; Gedefaw Abeje; Marjan Khejahei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundObstetric fistula is a serious and debilitating problem resulting from tissue necrosis on the reproductive and urinary and/or lower gastrointestinal tract organs due to prolonged labor. Primary studies of the treatment of obstetric fistulae report significantly variable treatment outcomes following surgical repair. However, no systematic review and meta-analysis has yet estimated the pooled proportion and identified the determinants of successful obstetric fistula surgical repair.ObjectiveTo estimate the proportion and identify the determinants of successful surgical repair of obstetric fistulae in low- and middle-income countries.MethodsThe protocol was developed and registered at the International Prospective Register of Systematic Reviews (ID CRD42022323630). Searches of PubMed, Embase, CINAHL, Scopus databases, and gray literature sources were performed. All the accessed studies were selected with Covidence, and the quality of the studies was examined. Finally, the data were extracted using Excel and analyzed with R software.ResultsThis review included 79 studies out of 9337 following the screening process. The analysis reveals that 77.85% (95%CI: 75.14%; 80.56%) of surgical repairs in low and middle-income countries are successful. Women who attain primary education and above, are married, and have alive neonatal outcomes are more likely to have successful repair outcomes. In contrast, women with female genital mutilation, primiparity, a large fistula size, a fistula classification of II and above, urethral damage, vaginal scarring, a circumferential defect, multiple fistulae, prior repair and postoperative complications are less likely to have successful repair outcomes.ConclusionThe proportion of successful surgical repairs of obstetric fistula in low and middle-income countries remains suboptimal. Hence, stakeholders and policymakers must design and implement policies promoting women’s education. In addition, fistula care providers need to reach and manage obstetric fistula cases early before complications, like vaginal fibrosis, occur.

  7. o

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • openicpsr.org
    Updated May 18, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2022 [Dataset]. http://doi.org/10.3886/E103500V10
    Explore at:
    Dataset updated
    May 18, 2018
    Dataset provided by
    Princeton University
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1991 - 2021
    Area covered
    United States
    Description

    !!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 10 release notes:Adds 2022 dataVersion 9 release notes:Adds 2021 data.Version 8 release notes:Adds 2019 and 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last UCR hate crime data they release. Changes .rda file to .rds.Version 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.

  8. Empirical formula calculation data of final bottom elevation of dam breach

    • tpdc.ac.cn
    • data.tpdc.ac.cn
    zip
    Updated Mar 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinhua ZHANG (2022). Empirical formula calculation data of final bottom elevation of dam breach [Dataset]. http://doi.org/10.11888/HumanNat.tpdc.272075
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 7, 2022
    Dataset provided by
    Tanzania Petroleum Development Corporationhttp://tpdc.co.tz/
    Authors
    Xinhua ZHANG
    Area covered
    Description

    Data content: empirical formula calculation data of final bottom elevation of dam breach Data source: a large database containing 1230 dam cases around the world based on literature retrieval. Collection method: processing and fitting through Excel data processing software. Data quality description: in order to solve the problem of assigning the final bottom elevation of the dam breach, based on the collected data of dam height and breach depth in the dam database, combined with the classification method of overtopping breach dam body erosion proposed by briaud in 2008, the dams were divided into three types: high, medium and low erosion degrees. Then the dam height and breach depth of the dam plug dam with different erosion degrees were regressed, The empirical formula for the depths of dam breaches with different erosion degrees were also fitted, and then the final bottom elevations of dam breaches were determined.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Samoa Bureau of Statistics (2019). Business Activity Survey 2009 - Samoa [Dataset]. https://microdata.pacificdata.org/index.php/catalog/253

Business Activity Survey 2009 - Samoa

Explore at:
Dataset updated
Jul 2, 2019
Dataset authored and provided by
Samoa Bureau of Statistics
Time period covered
2009
Area covered
Samoa
Description

Abstract

The intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).

The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.

The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.

Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).

A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.

Geographic coverage

National Coverage

Analysis unit

The main statistical unit to be used for the survey is the establishment. For simple businesses that undertake a single activity at a single location there is a one-to-one relationship between the establishment and the enterprise. For large and complex enterprises, however, it is desirable to separate each activity of an enterprise into establishments to provide the most detailed information possible for industrial analysis. The business register will need to be developed in such a way that records the links between establishments and their parent enterprises. The business register will be created from administrative records and may not have enough information to recognize all establishments of complex enterprises. Large businesses will be contacted prior to the survey post-out to determine if they have separate establishments. If so, the extended structure of the enterprise will be recorded on the business register and a questionnaire will be sent to the enterprise to be completed for each establishment.

SBS has decided to follow the New Zealand simplified version of its statistical units model for the 2009 BAS. Future surveys may consider location units and enterprise groups if they are found to be useful for statistical collections.

It should be noted that while establishment data may enable the derivation of detailed benchmark accounts, it may be necessary to aggregate up to enterprise level data for the benchmarks if the ongoing data used to extrapolate the benchmark forward (mainly VAGST) are only available at the enterprise level.

Universe

The BAS's covered all employing units, and excluded small non-employing units such as the market sellers. The surveys also excluded central government agencies engaged in public administration (ministries, public education and health, etc.). It only covers businesses that pay the VAGST. (Threshold SAT$75,000 and upwards).

Kind of data

Sample survey data [ssd]

Sampling procedure

-Total Sample Size was 1240 -Out of the 1240, 902 successfully completed the questionnaire. -The other remaining 338 either never responded or were omitted (some businesses were ommitted from the sample as they do not meet the requirement to be surveyed) -Selection was all employing units paying VAGST (Threshold SAT $75,000 upwards)

WILL CONFIRM LATER!!

OSO LE MEA E LE FAASA...AEA :-)

Mode of data collection

Mail Questionnaire [mail]

Research instrument

  1. General instructions, authority for the survey, etc;
  2. Business demography information on ownership, contact details, structure, etc.;
  3. Employment;
  4. Income;
  5. Expenses;
  6. Inventories;
  7. Profit or loss and reconciliation to business accounts' profit and loss;
  8. Fixed assets - purchases, disposals, book values
  9. Thank you and signature of respondent.

Supplementary Pages Additional pages have been prepared to collect data for a limited range of industries. 1.Production data. To rebase and redevelop the Industrial Production Index (IPI), it is intended to collect volume of production information from a selection of large manufacturing businesses. The selection of businesses and products is critical to the usefulness of the IPI. The products must be homogeneous, and be of enough importance to the economy to justify collecting the data. Significance criteria should be established for the selection of products to include in the IPI, and the 2009 BAS provides an opportunity to collect benchmark data for a range of products known to be significant (based on information in the existing IPI, CPI weights, export data, etc.) as well as open questions for respondents to provide information on other significant products. 2.Tourism. There is a strong demand for estimates of tourism value added. To estimate tourism value added using the international standard Tourism Satellite Account methodology requires the use of an input-output table, which is beyond the capacity of SBS at present. However, some indicative estimates of the main parts of the economy influenced by tourism can be derived if the necessary data are collected. Tourism is a demand concept, based on defining tourists (the international standard includes both international and domestic tourists), what products are characteristically purchased by tourists, and which industries supply those products. Some questions targeted at those industries that have significant involvement with tourists (hotels, restaurants, transport and tour operators, vehicle hire, etc.), on how much of their income is sourced from tourism would provide valuable indicators of the size of the direct impact of tourism.

Cleaning operations

Partial imputation was done at the time of receipt of questionnaires, after follow-up procedures to obtain fully completed questionnaires have been followed. Imputation followed a process, i.e., apply ratios from responding units in the imputation cell to the partial data that was supplied. Procedures were established during the editing stage (a) to preserve the integrity of the questionnaires as supplied by respondents, and (b) to record all changes made to the questionnaires during editing. If SBS staff writes on the form, for example, this should only be done in red pen, to distinguish the alterations from the original information.

Additional edit checks were developed, including checking against external data at enterprise/establishment level. External data to be checked against include VAGST and SNPF for turnover and purchases, and salaries and wages and employment data respectively. Editing and imputation processes were undertaken by FSD using Excel.

Sampling error estimates

NOT APPLICABLE!!

Search
Clear search
Close search
Google apps
Main menu