29 datasets found
  1. g

    Downloadstatistik GESIS Datenarchiv

    • search.gesis.org
    • da-ra.de
    Updated Feb 14, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GESIS - Data Archive for the Social Sciences (2019). Downloadstatistik GESIS Datenarchiv [Dataset]. http://doi.org/10.4232/1.13222
    Explore at:
    application/x-spss-sav(2154811), application/x-stata-dta(5384365), (2139418), application/x-spss-sav(2295631), (2051697)Available download formats
    Dataset updated
    Feb 14, 2019
    Dataset provided by
    GESIS Data Archive
    GESIS search
    Authors
    GESIS - Data Archive for the Social Sciences
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Time period covered
    Jan 1, 2004 - Dec 31, 2018
    Variables measured
    za_nr - Archive study number, doi - Digital Object Identifier, version - GESIS Archive Version, Access - Access category (0, A, B, C, D, E), Title - English study title (if n.a., German title), Title_DE - German study title (if n.a., English title), Total - All downloads combined (all years, all sources), d_2004_dbk - All DBK downloads from that respective year, d_2005_dbk - All DBK downloads from that respective year, d_2006_dbk - All DBK downloads from that respective year, and 63 more
    Description

    General information: The data sets contain information on how often materials of studies available through GESIS: Data Archive for the Social Sciences were downloaded and/or ordered through one of the archive´s plattforms/services between 2004 and 2018.

    Sources and plattforms: Study materials are accessible through various GESIS plattforms and services: Data Catalogue (DBK), histat, datorium, data service (and others).

    Years available: - Data Catalogue: 2012-2018 - data service: 2006-2018 - datorium: 2014-2018 - histat: 2004-2018

    Data sets: Data set ZA6899_Datasets_only_all_sources contains information on how often data files such as those with dta- (Stata) or sav- (SPSS) extension have been downloaded. Identification of data files is handled semi-automatically (depending on the plattform/serice). Multiple downloads of one file by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

    Data set ZA6899_Doc_and_Data_all_sources contains information on how often study materials have been downloaded. Multiple downloads of any file of the same study by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

    Both data sets are available in three formats: csv (quoted, semicolon-separated), dta (Stata v13, labeled) and sav (SPSS, labeled). All formats contain identical information.

    Variables: Variables/columns in both data sets are identical. za_nr ´Archive study number´ version ´GESIS Archiv Version´ doi ´Digital Object Identifier´ StudyNo ´Study number of respective study´ Title ´English study title´ Title_DE ´German study title´ Access ´Access category (0, A, B, C, D, E)´ PubYear ´Publication year of last version of the study´ inZACAT ´Study is currently also available via ZACAT´ inHISTAT ´Study is currently also available via HISTAT´ inDownloads ´There are currently data files available for download for this study in DBK or datorium´ Total ´All downloads combined´ downloads_2004 ´downloads/orders from all sources combined in 2004´ [up to ...] downloads_2018 ´downloads/orders from all sources combined in 2018´ d_2004_dbk ´downloads from source dbk in 2004´ [up to ...] d_2018_dbk ´downloads from source dbk in 2018´ d_2004_histat ´downloads from source histat in 2004´ [up to ...] d_2018_histat ´downloads from source histat in 2018´ d_2004_dataservice ´downloads/orders from source dataservice in 2004´ [up to ...] d_2018_dataservice ´downloads/orders from source dataservice in 2018´

    More information is available within the codebook.

  2. Vehicle licensing statistics data files

    • gov.uk
    • s3.amazonaws.com
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2025). Vehicle licensing statistics data files [Dataset]. https://www.gov.uk/government/statistical-data-sets/vehicle-licensing-statistics-data-files
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    Recent changes

    A number of changes were introduced to these data files in the 2022 release to help meet the needs of our users and to provide more detail.

    Fuel type has been added to:

    • df_VEH0120_GB
    • df_VEH0120_UK
    • df_VEH0160_GB
    • df_VEH0160_UK

    Historic UK data has been added to:

    • df_VEH0124 (now split into 2 files)
    • df_VEH0220
    • df_VEH0270

    A new datafile has been added df_VEH0520.

    We welcome any feedback on the structure of our data files, their usability, or any suggestions for improvements; please contact vehicles statistics.

    How to use CSV files

    CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).

    When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.

    Download data files

    Make and model by quarter

    df_VEH0120_GB: https://assets.publishing.service.gov.uk/media/68494aca74fe8fe0cbb4676c/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 58.1 MB)

    Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)

    Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]

    df_VEH0120_UK: https://assets.publishing.service.gov.uk/media/68494acb782e42a839d3a3ac/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 34.1 MB)

    Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)

    Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]

    df_VEH0160_GB: https://assets.publishing.service.gov.uk/media/68494ad774fe8fe0cbb4676d/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 24.8 MB)

    Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)

    Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]

    df_VEH0160_UK: https://assets.publishing.service.gov.uk/media/68494ad7aae47e0d6c06e078/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 8.26 MB)

    Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)

    Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]

    Make and model by age

    In order to keep the datafile df_VEH0124 to a reasonable size, it has been split into 2 halves; 1 covering makes starting with A to M, and the other covering makes starting with N to Z.

    df_VEH0124_AM: <a class="govuk-link" href="https://assets.

  3. u

    Supplementary file including normalized data sets to reproduce the analyses...

    • data.ub.uni-muenchen.de
    Updated Nov 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Supplementary file including normalized data sets to reproduce the analyses presented in the paper "Use of pre-transformation to cope with extreme values in important candidate features" by Boulesteix, Guillemot & Sauerbrei (Biometrical Journal, 2011) [Dataset]. http://doi.org/10.5282/ubm/data.39
    Explore at:
    Dataset updated
    Nov 29, 2019
    Description

    The zip-file contains supplementary files (normalized data sets and R-codes) to reproduce the analyses presented in the paper "Use of pre-transformation to cope with extreme values in important candidate features" by Boulesteix, Guillemot & Sauerbrei (Biometrical Journal, 2011). The raw data (CEL-files) are publicly available and described in the following papers: - Ancona et al, 2006. On the statistical assessment of classifiers using DNA microarray data. BMC Bioinformatics 7, 387. - Miller et al, 2005. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proceedings of the National Academy of Science 102, 13550–13555. - Minn et al, 2005. Genes that mediate breast cancer metastasis to lung. Nature 436, 518–524. - Pawitan et al, 2005. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Research 7, R953–964. - Scherzer et al, 2007. Molecular markers of early parkinsons disease based on gene expression in blood. Proceedings of the National Academy of Science 104, 955-960. - Singh et al, 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209. - Sotiriou et al, 2006. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute 98, 262–272. - Tang et al, 2009. Gene-expression profiling of peripheral blood mononuclear cells in sepsis. Critical Care Medicine 37, 882–888. - Wang et al, 2005. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365, 671–679. - Irizarry, 2003. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31 (4), e15. - Irizarry et al, 2006. Comparison of Affymetrix GeneChip expression measures. Bioinformatics 22 (7), 789–794.

  4. Executive Functioning Data

    • openneuro.org
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracy Brandmeyer; Arnaud Delorme (2022). Executive Functioning Data [Dataset]. http://doi.org/10.18112/openneuro.ds004350.v1.0.0
    Explore at:
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Tracy Brandmeyer; Arnaud Delorme
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Executive Functioning Tasks

    The data of this dataset was collected as part of an executive functioning battery consisting of three separate tasks:

    1) N-Back (NB)

    2) Sustained Attention to Response Task (SART)

    3) Local Global (LG)

    The original experiment details in which these tasks were conducted in addition to can be read about here (https://doi.org/10.3389/fnhum.2020.00246).

    Experiment Design: Two sessions of each task were conducted on the first and last day of the neurofeedback experiment with 24 participants (mentioned above).

    [N-Back (NB)] Participants performed a visual sequential letter n-back working memory task, with memory load ranging from 1-back to 3-back. The visual stimuli consisted of a sequence of 4 letters (A, B, C, D) presented black on a gray background. Participants observed stimuli on a visual display and responded using the spacebar on a provided keyboard. In the 1-back condition, the target was any letter identical to the trial immediately preceding one. In the 2-back and 3-back conditions, the target was any letter that was presented two or three trials back, respectively. The stimuli were presented on a screen for a duration of 1 s, after which a fixation cross was presented for 500 ms. Participants responded to each stimulus by pressing the spacebar with their right hand upon target presentation. If no spacebar was pressed within 1500 ms of the stimulus presentation, a new stimulus was presented. Each n-back condition (1, 2, and 3-back) consisted of the presentation of 280 stimuli selected randomly in the 4-letter pool.

    [Sustained Attention to Response Task (SART)] Participants were presented with a series of single numerical digits (randomly selected from 0 to 9 - the same digit could not be presented twice in a row) and instructed to press the spacebar for each digit, except for when presented with the digit 3. Each number was presented for 400 ms in white on a gray background. The inter-stimulus interval was 2 s irrespective of the button press and a fixation cross was present at all times except for when the digits were presented. Participants performed the SART for approximately 10 minutes corresponding to 250 digit presentations.

    [Local Global (LG)] Participants were shown large letters (H and T) on a computer screen. The large letters were made up of an aggregate of smaller letters that could be congruent (i.e large H made of small Hs or large T made of small Ts) or incongruent (large H made of small Ts or large T made of small Hs) with respect to the large letter. The small letters were 0.8 cm high and the large letters were 8 cm high on the computer screen. A fixation cross was present at all times except when the stimulus letters were presented. Letters were shown on the computer screen until the subject responded. After each subject's response, there was a delay of 1 s before the next stimulus was presented. Before each sequence of letters, instructions were shown on a computer screen indicating to participants whether they should respond to the presence of small (local condition) or large (global condition) letters. The participants were instructed to categorize specifically large letters or small letters and to press the letter H or T on the computer keyboard to indicate their choice.

    Data Processing: Data processing was performed in Matlab and EEGLAB. The EEG data was average referenced and down-sampled from 2048 to 256 Hz. A high-pass filter at 1 HZ using an elliptical non-linear filter was applied and the data was then average referenced.

    Note: The data files in this dataset were converted into the .set format for EEGLAB. The .bdf files that were converted for each of the tasks can be found in the sourcedata folder.

    Exclusion Note: The second run of NB in session 1 of sub-11 and the run of SART in session 1 of sub-18 were both excluded due to issues with conversion to .set format. However, the .bdf files of these runs can be found in the sourcedata folder.

  5. e

    Waterworks — intake point_reporting

    • data.europa.eu
    • gimi9.com
    unknown
    Updated Feb 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Waterworks — intake point_reporting [Dataset]. https://data.europa.eu/data/datasets/https-data-norge-no-node-1499
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Feb 7, 2022
    License

    https://data.norge.no/nlod/en/2.0/https://data.norge.no/nlod/en/2.0/

    Description

    The data sets provide an overview of selected data on waterworks registered with the Norwegian Food Safety Authority. The information has been reported by the waterworks through application processing or other reporting to the Norwegian Food Safety Authority. Drinking water regulations require, among other things, annual reporting. The Norwegian Food Safety Authority has created a separate form service for such reporting. The data sets include public or private waterworks that supply 50 people or more. In addition, all municipal owned businesses with their own water supply are included regardless of size. The data sets also contain decommissioned facilities. This is done for those who wish to view historical data, i.e. data for previous years or earlier. There are data sets for the following supervisory objects: 1. Water supply system. It also includes analysis of drinking water. 2. Transport system 3. Treatment facility 4. Entry point. It also includes analysis of the water source. Below you will find datasets for: 4. Input point_reporting. In addition, there is a file (information.txt) that provides an overview of when the extracts were produced and how many lines there are in the individual files. The withdrawals are done weekly. Furthermore, for the data sets water supply system, transport system and intake point it is possible to see historical data on what is included in the annual reporting. To make use of that information, the file must be linked to the “moder” file. to get names and other static information. These files have the _reporting ending in the file name. Description of the data fields (i.e. metadata) in the individual data sets appears in separate files. These are available in pdf format. If you double-click the csv file and it opens directly in excel, then you will not get the æøå. To see the character set correctly in Excel, you must: & start Excel and a new spreadsheet & select data and then from text, press Import & select separator data and file origin 65001: Unicode (UTF-8) and tick of My Data have headings and press Next & remove tab as separator and select semicolon as separator, press next & otherwise, complete the data sets can be imported into a separate database and compiled as desired. There are link keys in the files that make it possible to link the files together. The waterworks are responsible for the quality of the datasets.

    Purpose: Make data for drinking water supply available to the public.

  6. w

    Vehicle licensing statistics data tables

    • gov.uk
    • s3.amazonaws.com
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2025). Vehicle licensing statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/vehicle-licensing-statistics-data-tables
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    GOV.UK
    Authors
    Department for Transport
    Description

    Data files containing detailed information about vehicles in the UK are also available, including make and model data.

    Some tables have been withdrawn and replaced. The table index for this statistical series has been updated to provide a full map between the old and new numbering systems used in this page.

    Tables VEH0101 and VEH1104 have not yet been revised to include the recent changes to Large Goods Vehicles (LGV) and Heavy Goods Vehicles (HGV) definitions for data earlier than 2023 quarter 4. This will be amended as soon as possible.

    All vehicles

    Licensed vehicles

    Overview

    VEH0101: https://assets.publishing.service.gov.uk/media/6846e8dc57f3515d9611f119/veh0101.ods">Vehicles at the end of the quarter by licence status and body type: Great Britain and United Kingdom (ODS, 151 KB)

    Detailed breakdowns

    VEH0103: https://assets.publishing.service.gov.uk/media/6846e8dcd25e6f6afd4c01d5/veh0103.ods">Licensed vehicles at the end of the year by tax class: Great Britain and United Kingdom (ODS, 33 KB)

    VEH0105: https://assets.publishing.service.gov.uk/media/6846e8dd57f3515d9611f11a/veh0105.ods">Licensed vehicles at the end of the quarter by body type, fuel type, keepership (private and company) and upper and lower tier local authority: Great Britain and United Kingdom (ODS, 16.3 MB)

    VEH0206: https://assets.publishing.service.gov.uk/media/6846e8dee5a089417c806179/veh0206.ods">Licensed cars at the end of the year by VED band and carbon dioxide (CO2) emissions: Great Britain and United Kingdom (ODS, 42.3 KB)

    VEH0601: https://assets.publishing.service.gov.uk/media/6846e8df5e92539572806176/veh0601.ods">Licensed buses and coaches at the end of the year by body type detail: Great Britain and United Kingdom (ODS, 24.6 KB)

    VEH1102: https://assets.publishing.service.gov.uk/media/6846e8e0e5a089417c80617b/veh1102.ods">Licensed vehicles at the end of the year by body type and keepership (private and company): Great Britain and United Kingdom (ODS, 146 KB)

    VEH1103: https://assets.publishing.service.gov.uk/media/6846e8e0e5a089417c80617c/veh1103.ods">Licensed vehicles at the end of the quarter by body type and fuel type: Great Britain and United Kingdom (ODS, 992 KB)

    VEH1104: https://assets.publishing.service.gov.uk/media/6846e8e15e92539572806177/veh1104.ods">Licensed vehicles at the end of the

  7. 4

    Supplementary data files for the PhD thesis "Dancing the Vibe: Designerly...

    • data.4tu.nl
    Updated Jun 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alev Sönmez; Pieter M. A. Desmet; Natalia Romero Herrera (2024). Supplementary data files for the PhD thesis "Dancing the Vibe: Designerly Exploration of Group Mood in Work Settings" [Dataset]. http://doi.org/10.4121/c0202ba2-bdbe-4790-8b80-74baf3a08b4b.v1
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Alev Sönmez; Pieter M. A. Desmet; Natalia Romero Herrera
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Time period covered
    Oct 2019 - Sep 2021
    Dataset funded by
    The Netherlands Organization for Scientific Research (NWO)
    Description

    This dataset comprises five sets of data collected throughout of Alev Sönmez’s PhD Thesis project: Sönmez, A. (2024). Dancing the Vibe: Designerly Exploration of Group Mood in Work Settings. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.


    This thesis aims to contribute to the granular understanding of group mood by achieving three objectives,each representing a key research question in the project: (1) to develop a descriptive overview of nuanced group moods, (2) to develop knowledge and tools to effectively communicate nuanced group moods, and (3) to develop knowledge and insights to facilitate reflection on group mood. The research was guided by the following research questions: (1) What types of group moods are experienced in small work groups? (2) How can nuanced group moods be effectively communicated? (3) How can group mood reflection be facilitated?


    This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.


    The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.


    Capter_2_PhenomenologicalStudy: This dataset conssists of anonymized transcriptions of co-inquiry sessions where 5 small project groups described the group moods they experienced in their eight most recent meetings. Additonaly, we share the observation notes wwe collected in those meetings, the maps filled in during the co-inquiry sessions, the materials used to collect data, and the coding scheme used to analyze the group mood descriptions.


    Chapter_3_ImageEvaluationStudy: This dataset consists of anonymized scores from 38 participants indicating the strength of the association between eight group mood–expressing images and 36 group mood qualities, along with their free descriptions of the group moods perceived in those images. Addtioanlly we share the questionnaire design, the eight images, and the data processing files (t-test, correspondence analysis outputs, free description coding, heat map).


    Chapter_4_VideoEvaluationStudy: This dataset consists of anonymized scores from 40 participants indicating the strength of the association between eight group mood–expressing videos and 36 group mood qualities, along with their free descriptions of the group moods perceived in those videos. Addtioanlly we share the questionnaire design, and the data processing files (t-test, correspondence analysis outputs, free description coding, heat map) and data processing files to compare the image and video set (PCA output, and image-video HIT rate comparison table).


    Chapter_5_CardsetInterventionStudy: This dataset consists of anonymized written responses from each of the 12 project teams, along with notes taken during a plenary session with these teams, evaluating the efficacy of the intervention on their group mood management.


    Chapter_6_WorkshopEvaluationStudy: This dataset consists of Anonymized transcriptions of five small work teams reflecting on their lived group mood experiences following the steps of an embodiment workshop we designed, including their takeaways from the workshop and discussions evaluating the workshop's efficacy in stimulating reflection and the overall experience of the workshop.


    All the data is anonymized by removing the names of individuals and institutions. However, the interviews contain details where participants shared personal information about themselves, colleagues, and company dynamics. Therefore, the data should be handled with extra care to ensure that participant privacy is not put in danger. Contact N.A.Romero@tudelft.nl (Natalia Romero Herrera) to request access to the dataset.

  8. Data from: Consumer Expenditure Survey, 2004: Diary Survey

    • icpsr.umich.edu
    Updated Aug 1, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Labor. Bureau of Labor Statistics (2013). Consumer Expenditure Survey, 2004: Diary Survey [Dataset]. http://doi.org/10.3886/ICPSR04415.v2
    Explore at:
    Dataset updated
    Aug 1, 2013
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States Department of Labor. Bureau of Labor Statistics
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/4415/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/4415/terms

    Time period covered
    2004
    Area covered
    United States
    Description

    The Consumer Expenditure Survey (CE) program provides a continuous and comprehensive flow of data on the buying habits of American consumers including data on their expenditures, income, and consumer unit (families and single consumers) characteristics. These data are used widely in economic research and analysis, and in support of revisions of the Consumer Price Index. The Consumer Expenditure Survey (CE) program is comprised of two separate components (each with its own survey questionnaire and independent sample), the Diary Survey and the quarterly Interview Survey (ICPSR 4416). This data collection contains the Diary Survey data, which was designed to obtain data on frequently purchased smaller items, including food and beverages (both at home and in food establishments), gasoline, housekeeping supplies, tobacco, nonprescription drugs, and personal care products and services. Each consumer unit (CU) recorded its expenditures in a diary for two consecutive 1-week periods. Although the diary was designed to collect information on expenditures that could not be easily recalled over time, respondents were asked to report all expenses (except overnight travel) that the CU incurred during the survey week. The microdata in this collection are available as SAS, SPSS, and STATA datasets or ASCII comma-delimited files. The 2004 Diary release contains five sets of data files (FMLY, MEMB, EXPN, DTAB, DTAB_IMPUTE) and three processing files. The FMLY, MEMB, EXPN, DTAB, and DTAB_IMPUTE files are organized by the quarter of the calendar year in which the data were collected. There are four quarterly datasets for each of these files. The FMLY files contain CU characteristics, income, and summary level expenditures; the MEMB files contain member characteristics and income data; the EXPN files contain detailed weekly expenditures at the Universal Classification Code (UCC) level; the DTAB files contain the CU's reported income values or the mean of the five imputed income values in the multiple imputation method; and the DTAB_IMPUTE files contain the five imputed income values. Please note that the summary level expenditure and income information on the FMLY files permits the data user to link consumer spending, by general expenditure category, and household characteristics and demographics on one set of files. The three processing files enhance computer processing and tabulation of data, and provide descriptive information on item codes. The three processing files are: (1) an aggregation scheme file used in the published consumer expenditure tables (DSTUB), (2) a UCC file that contains UCCs and their abbreviated titles, identifying the expenditure, income, or demographic item represented by each UCC, and (3) a sample program file that contains the computer program used in Section VII "MICRODATA VERIFICATION AND ESTIMATION METHODOLOGY" of the Diary User Guide. The processing files are further explained in Section III.E.5. "PROCESSING FILES" of the same User Guide documentation. There is also a second user guide, User's Guide to Income Imputation in the CE, which includes information on how to appropriately use the imputed income data. Demographic and family characteristics data include age, sex, race, marital status, and CU relationships for each CU member. Income information, such as wage, salary, unemployment compensation, child support, and alimony, as well as information on the employment of each CU member age 14 and over was also collected.

  9. m

    Simulated data for: "Statistical Modeling with Litter as a Random Effect in...

    • data.mendeley.com
    Updated Sep 16, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Sobin (2019). Simulated data for: "Statistical Modeling with Litter as a Random Effect in Mixed Models to Manage “Intralitter Likeness”demonstration of including litter as a random effect [Dataset]. http://doi.org/10.17632/y42c2mpr8f.1
    Explore at:
    Dataset updated
    Sep 16, 2019
    Authors
    Christina Sobin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data files here included are simulated data, the ranges for which were based on data from studies of C57BL6J mice, previously analyzed and published. The files include one data set of N = 180 from 30 litters, 3 treatment groups, 10 litters per treatment, 6 mice per litter, with equal numbers of males (n = 3) and females (n = 3) per litter. Also provided are 4 data sets of N = 60 each, that were drawn from the larger data set of N = 180, and which include 1 "representative" male and 1 "representative" female per litter. Each data set includes five variables: mouse ID, litter, sex, group, and body weight at post-natal day 21.

    *These datasets were created solely for the purpose of demonstrating differences in statistical modeling when litter clusters complicate data analysis. Please see complete manuscript for a full discussion of the topic.

  10. Z

    Data from: RadarScenes: A Real-World Radar Point Cloud Data Set for...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wöhler, Christian (2021). RadarScenes: A Real-World Radar Point Cloud Data Set for Automotive Applications [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4559820
    Explore at:
    Dataset updated
    Apr 7, 2021
    Dataset provided by
    Schumann, Ole
    Wöhler, Christian
    Tilly, Julius
    Hahn, Markus
    Scheiner, Nicolas
    Weishaupt, Fabio
    Dickmann, Jürgen
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The RadarScenes data set (“data set”) contains recordings from four automotive radar sensors, which were mounted on one measurement-vehicle. Images from one front-facing documentary camera are added.

    The data set has a length of over 4h and in addition to the point cloud data from the radar sensors, semantic annotations on a point-wise level from 12 different classes are provided.

    In addition to point-wise class labels, a track-id is attached to each individual detection of a dynamic object, so that individual objects can be tracked over time.

    Structure of the Data Set

    The data set consist of 158 individual sequences. For each sequence, the recorded data from radar and odometry sensors are stored in one hdf5 file. Each of these files is accompanied by a json file called “scenes.json” in which meta-information are stored. In a subfolder, the camera images are stored as jpg files.

    Two additional json files give further meta-information: in the "sensor.json" file, the sensor mounting position and rotation angles are defined. In the file "sequences.json", all recorded sequences are listed with additional information, e.g. about the recording duration.

    sensors.json

    This file describes the position and orientation of the four radar sensors. Each sensor is attributed with an integer id. The mounting position is given relative to the center of the rear axle of the vehicle. This allows for an easier calculation of the ego-motion at the position of the sensors. Only the x and y position is given, since no elevation information is provided by the sensors. Similarly, only the yaw-angle for the rotation is needed.

    sequences.json

    This file contains one entry for each recorded sequence. Each entry is built from the following information: the category (training or validation of machine learning algorithms), the number of individual scenes within the sequence, the duration in seconds and the names of the sensors which performed measurements within this sequence.

    scenes.json

    In this file, meta-information for a specific sequence and the scenes within this sequence are stored.

    The name of the sequence is listed within the top-level dictionary, the group of this sequence (training or validation) as well as the timestamps of the first and last time a radar sensor performed a measurement in this sequence.

    A scene is defined as one measurement of one of the four radar sensors. For each scene, the sensor id of the respective radar sensor is listed. Each scene has one unique timestamp, namely the time at which the radar sensor performed the measurement. Four timestamps of different radar measurement are given for each scene: the next and previous timestamp of a measurement of the same sensor and the next and previous timestamp of a measurement of any radar sensor. This allows to quickly iterate over measurements from all sensors or over all measurements of a single sensor. For the association with the odometry information, the timestamp of the closest odometry measurement and additionally the index in the odometry table in the hdf5 file where this measurement can be found are given. Furthermore, the filename of the camera image whose timestamp is closest to the radar measurement is given. Finally, the start and end indices of this scene’s radar detections in the hdf5 data set “radar_data” is given. The first index corresponds to the row in the hdf5 data set in which the first detection of this scene can be found. The second index corresponds to the row in the hdf5 data set in which the next scene starts. That is, the detection in this row is the first one that does not belong to the scene anymore. This convention allows to use the common python indexing into lists and arrays, where the second index is exclusive: arr[start:end].

    radar_data.h5

    In this file, both the radar and the odometry data are stored. Two data sets exists within this file: “odometry” and “radar_data”.

    The “odometry” data has six columns: timestamp, x_seq, y_seq, yaw_seq, vx, yaw_rate. Each row corresponds to one measurement of the driving state. The columns x_seq, y_seq and yaw_seq describe the position and orientation of the ego-vehicle relative to some global origin. Hence, the pose in a global (sequence) coordinate system is defined. The column “vx” contains the velocity of the ego-vehicle in x-direction and the yaw_rate column contains the current yaw rate of the car.

    The hdf5 data set “radar_data” is composed of the individual detections. Each row in the data set corresponds to one detection. A detection is defined by the following signals, each being listed in one column:

    timestamp: in micro seconds relative to some arbitrary origin

    sensor_id: integer value, id of the sensor that recorded the detection

    range_sc: in meters, radial distance to the detection, sensor coordinate system

    azimuth_sc: in radians, azimuth angle to the detection, sensor coordinate system

    rcs: in dBsm, RCS value of the detection

    vr: in m/s. Radial velocity measured for this detection

    vr_compensated in m/s: Radial velocity for this detection but compensated for the ego-motion

    x_cc and y_cc: in m, position of the detection in the car-coordinate system (origin is at the center of the rear-axle)

    x_seq and y_seq in m, position of the detection in the global sequence-coordinate system (origin is at arbitrary start point)

    uuid: unique identifier for the detection. Can be used for association with predicted labels and debugging

    track_id: id of the dynamic object this detection belongs to. Empty, if it does not belong to any.

    label_id: semantic class id of the object to which this detection belongs. passenger cars (0), large vehicles (like agricultural or construction vehicles) (1), trucks (2), busses (3), trains (4), bicycles (5), motorized two-wheeler (6), pedestrians (7), groups of pedestrian (8), animals (9), all other dynamic objects encountered while driving (10), and the static environment (11)

    Camera Images

    The images of the documentary camera are located in the subfolder “camera” of each sequence. The filename of each image corresponds to the timestamp at which the image was recorded.

    The data set is a radar data set. Camera images are only included so that users of the data set get a better understanding of the recorded scenes. However, due to GDPR requirements, personal information was removed from these images via re-painting of regions proposed by a semantic instance segmentation network and manual correction. The networks were optimized for high recall values so that false-negatives were suppressed at the cost of having false positive markings. As the camera images are only meant to be used as guidance to the recorded radar scenes, this shortcoming has no negative effect on the actual data.

    Tools

    Some helper tools - including a viewer - can be found in the python package radar_scenes. Details can be found here: https://github.com/oleschum/radar_scenes

    Publications

    Previous publications related to classification algorithms on radar data already used this data set:

    Scene Understanding With Automotive Radar; https://ieeexplore.ieee.org/document/8911477

    Semantic Segmentation on Radar Point Clouds, https://ieeexplore.ieee.org/document/8455344

    Off-the-shelf sensor vs. experimental radar - How much resolution is necessary in automotive radar classification?, https://ieeexplore.ieee.org/document/9190338

    Detection and Tracking on Automotive Radar Data with Deep Learning, https://ieeexplore.ieee.org/document/9190261

    Comparison of random forest and long short-term memory network performances in classification tasks using radar, https://ieeexplore.ieee.org/document/8126350

    License

    The data set is licensed under Creative Commons Attribution Non Commercial Share Alike 4.0 International (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode). Hence, the data set must not be used for any commercial use cases.

    Disclaimer

    That the data set comes "AS IS", without express or implied warranty and/or any liability exceeding mandatory statutory obligations. This especially applies to any obligations of care or indemnification in connection with the data set. The annotations were created for our research purposes only and no quality assessment was done for the usage in products of any kind. We can therefore not guarantee for the correctness, completeness or reliability of the provided data set.

  11. v

    Enhanced Historical Land-Use and Land-Cover Data Sets of the U.S. Geological...

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • data.usgs.gov
    • +4more
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Enhanced Historical Land-Use and Land-Cover Data Sets of the U.S. Geological Survey: Data Source Index Polygons [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/enhanced-historical-land-use-and-land-cover-data-sets-of-the-u-s-geological-survey-data-so
    Explore at:
    Dataset updated
    Nov 1, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This polygon data set provides ancillary information to supplement a release of enhanced U.S. Geological Survey (USGS) historical land-use and land-cover data. The data set presents some of the original file-header documentation, as well as some details describing how the data files were used in the data release, in a geographic context.

  12. Z

    FSDKaggle2019

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frederic Font (2020). FSDKaggle2019 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3612636
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Eduardo Fonseca
    Daniel P. W. Ellis
    Xavier Serra
    Frederic Font
    Manoj Plakal
    Description

    FSDKaggle2019 is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology. FSDKaggle2019 has been used for the DCASE Challenge 2019 Task 2, which was run as a Kaggle competition titled Freesound Audio Tagging 2019.

    Citation

    If you use the FSDKaggle2019 dataset or part of it, please cite our DCASE 2019 paper:

    Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra. "Audio tagging with noisy labels and minimal supervision". Proceedings of the DCASE 2019 Workshop, NYC, US (2019)

    You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2019.

    Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

    Data curators

    Eduardo Fonseca, Manoj Plakal, Xavier Favory, Jordi Pons

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

    ABOUT FSDKaggle2019

    Freesound Dataset Kaggle 2019 (or FSDKaggle2019 for short) is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology [1]. FSDKaggle2019 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2019. Please visit the DCASE2019 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound Audio Tagging 2019. It was organized by researchers from the Music Technology Group (MTG) of Universitat Pompeu Fabra (UPF), and from Sound Understanding team at Google AI Perception. The competition intended to provide insight towards the development of broadly-applicable sound event classifiers able to cope with label noise and minimal supervision conditions.

    FSDKaggle2019 employs audio clips from the following sources:

    Freesound Dataset (FSD): a dataset being collected at the MTG-UPF based on Freesound content organized with the AudioSet Ontology

    The soundtracks of a pool of Flickr videos taken from the Yahoo Flickr Creative Commons 100M dataset (YFCC)

    The audio data is labeled using a vocabulary of 80 labels from Google’s AudioSet Ontology [1], covering diverse topics: Guitar and other Musical Instruments, Percussion, Water, Digestive, Respiratory sounds, Human voice, Human locomotion, Hands, Human group actions, Insect, Domestic animals, Glass, Liquid, Motor vehicle (road), Mechanisms, Doors, and a variety of Domestic sounds. The full list of categories can be inspected in vocabulary.csv (see Files & Download below). The goal of the task was to build a multi-label audio tagging system that can predict appropriate label(s) for each audio clip in a test set.

    What follows is a summary of some of the most relevant characteristics of FSDKaggle2019. Nevertheless, it is highly recommended to read our DCASE 2019 paper for a more in-depth description of the dataset and how it was built.

    Ground Truth Labels

    The ground truth labels are provided at the clip-level, and express the presence of a sound category in the audio clip, hence can be considered weak labels or tags. Audio clips have variable lengths (roughly from 0.3 to 30s).

    The audio content from FSD has been manually labeled by humans following a data labeling process using the Freesound Annotator platform. Most labels have inter-annotator agreement but not all of them. More details about the data labeling process and the Freesound Annotator can be found in [2].

    The YFCC soundtracks were labeled using automated heuristics applied to the audio content and metadata of the original Flickr clips. Hence, a substantial amount of label noise can be expected. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises. More information about some of the types of label noise that can be encountered is available in [3].

    Specifically, FSDKaggle2019 features three types of label quality, one for each set in the dataset:

    curated train set: correct (but potentially incomplete) labels

    noisy train set: noisy labels

    test set: correct and complete labels

    Further details can be found below in the sections for each set.

    Format

    All audio clips are provided as uncompressed PCM 16 bit, 44.1 kHz, mono audio files.

    DATA SPLIT

    FSDKaggle2019 consists of two train sets and one test set. The idea is to limit the supervision provided for training (i.e., the manually-labeled, hence reliable, data), thus promoting approaches to deal with label noise.

    Curated train set

    The curated train set consists of manually-labeled data from FSD.

    Number of clips/class: 75 except in a few cases (where there are less)

    Total number of clips: 4970

    Avg number of labels/clip: 1.2

    Total duration: 10.5 hours

    The duration of the audio clips ranges from 0.3 to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording/uploading sounds. Labels are correct but potentially incomplete. It can happen that a few of these audio clips present additional acoustic material beyond the provided ground truth label(s).

    Noisy train set

    The noisy train set is a larger set of noisy web audio data from Flickr videos taken from the YFCC dataset [5].

    Number of clips/class: 300

    Total number of clips: 19,815

    Avg number of labels/clip: 1.2

    Total duration: ~80 hours

    The duration of the audio clips ranges from 1s to 15s, with the vast majority lasting 15s. Labels are automatically generated and purposefully noisy. No human validation is involved. The label noise can vary widely in amount and type depending on the category, including in- and out-of-vocabulary noises.

    Considering the numbers above, the per-class data distribution available for training is, for most of the classes, 300 clips from the noisy train set and 75 clips from the curated train set. This means 80% noisy / 20% curated at the clip level, while at the duration level the proportion is more extreme considering the variable-length clips.

    Test set

    The test set is used for system evaluation and consists of manually-labeled data from FSD.

    Number of clips/class: between 50 and 150

    Total number of clips: 4481

    Avg number of labels/clip: 1.4

    Total duration: 12.9 hours

    The acoustic material present in the test set clips is labeled exhaustively using the aforementioned vocabulary of 80 classes. Most labels have inter-annotator agreement but not all of them. Except human error, the label(s) are correct and complete considering the target vocabulary; nonetheless, a few clips could still present additional (unlabeled) acoustic content out of the vocabulary.

    During the DCASE2019 Challenge Task 2, the test set was split into two subsets, for the public and private leaderboards, and only the data corresponding to the public leaderboard was provided. In this current package you will find the full test set with all the test labels. To allow comparison with previous work, the file test_post_competition.csv includes a flag to determine the corresponding leaderboard (public or private) for each test clip (see more info in Files & Download below).

    Acoustic mismatch

    As mentioned before, FSDKaggle2019 uses audio clips from two sources:

    FSD: curated train set and test set, and

    YFCC: noisy train set.

    While the sources of audio (Freesound and Flickr) are collaboratively contributed and pretty diverse themselves, a certain acoustic mismatch can be expected between FSD and YFCC. We conjecture this mismatch comes from a variety of reasons. For example, through acoustic inspection of a small sample of both data sources, we find a higher percentage of high quality recordings in FSD. In addition, audio clips in Freesound are typically recorded with the purpose of capturing audio, which is not necessarily the case in YFCC.

    This mismatch can have an impact in the evaluation, considering that most of the train data come from YFCC, while all test data are drawn from FSD. This constraint (i.e., noisy training data coming from a different web audio source than the test set) is sometimes a real-world condition.

    LICENSE

    All clips in FSDKaggle2019 are released under Creative Commons (CC) licenses. For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses.

    Curated train set and test set. All clips in Freesound are released under different modalities of Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. The licenses are specified in the files train_curated_post_competition.csv and test_post_competition.csv. These licenses can be CC0, CC-BY, CC-BY-NC and CC Sampling+.

    Noisy train set. Similarly, the licenses of the soundtracks from Flickr used in FSDKaggle2019 are specified in the file train_noisy_post_competition.csv. These licenses can be CC-BY and CC BY-SA.

    In addition, FSDKaggle2019 as a whole is the result of a curation process and it has an additional license. FSDKaggle2019 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2019.doc zip file.

    FILES & DOWNLOAD

    FSDKaggle2019 can be downloaded as a series of zip files with the following directory structure:

    root │
    └───FSDKaggle2019.audio_train_curated/ Audio clips in the curated train set │ └───FSDKaggle2019.audio_train_noisy/ Audio clips in the noisy

  13. Z

    Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nurmi, Juha (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Nurmi, Juha
    Brumley, Billy
    Niemelä, Mikko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

    Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

    We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

    1. MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

    2. VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

    3. AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

    Credits Authors

    Billy Bob Brumley (Tampere University, Tampere, Finland)

    Juha Nurmi (Tampere University, Tampere, Finland)

    Mikko Niemelä (Cyber Intelligence House, Singapore)

    Funding

    This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

    Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.

  14. c

    ckanext-files - Extensions - CKAN Ecosystem Catalog

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-files - Extensions - CKAN Ecosystem Catalog [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-files
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The Files extension enhances CKAN by treating files as primary objects within the system. It provides the capability to upload, manage, and remove files directly through CKAN, then associate them with datasets and resources. This file-centric approach offers a streamlined method to handle file attachments and ensure their proper management within the CKAN environment. Key Features: Direct File Upload: Enables users to upload files directly to CKAN, simplifying the process of attaching data or related documents. File Management: Provides tools for managing uploaded files, potentially including functionalities for renaming, moving, or updating files. File Removal: Offers mechanisms to remove files from CKAN, allowing for proper maintenance and cleanup of the file repository. Association with Datasets and Resources: Facilitates the linking of uploaded files to specific datasets or resources within CKAN, creating contextual relationships between data and associated files. Integration with CKAN: The extension integrates with CKAN by adding a plugin, as indicated by the requirement to add "files" to the ckan.plugins setting in the CKAN configuration file. This suggests an extension of CKAN's core functionality, adding new features for file management and association. The extension also necessitates running DB migrations, indicating it might introduce new database tables or modify existing schemas to store file-related metadata. Benefits & Impact: By enabling direct file management within CKAN, the Files extension offers a more convenient and organized approach to handling supplementary data. This can lead to better data discoverability, improved resource organization. The file management capabilities reduce management overhead, and associating files with datasets can enrich both sets of data resulting in greater data utility.

  15. t

    IMU Data for different Motorcyclist Behaviour

    • researchdata.tuwien.ac.at
    zip
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerhard Navratil; Ioannis Giannopoulos; Ioannis Giannopoulos; Gerhard Navratil; Gerhard Navratil; Gerhard Navratil (2024). IMU Data for different Motorcyclist Behaviour [Dataset]. http://doi.org/10.48436/re6xk-ydq75
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Gerhard Navratil; Ioannis Giannopoulos; Ioannis Giannopoulos; Gerhard Navratil; Gerhard Navratil; Gerhard Navratil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 17, 2023
    Description

    The data sets were collected during motorcycle trips near Vienna in 2021 and 2022. The behavior was split into different classes using videos (not part of the published data due to privacy concerns) and then cut into segments of 10 seconds.

    Context and methodology

    • The data set was collected to show how accurate motorcyclist behavior can be assessed using IMU data
    • The work follows the ideas published in http://hdl.handle.net/20.500.12708/43982
    • The authors have a background in geodesy and computer science respectively and work in the field of geoinformation / navigation

    Technical details

    • The data are stored as CSV files
    • Each file contains data from a unique behavior and has a length of 10 seconds
    • Each file has a header describing the columns
    • Units for acceleration are meters per squared second, units for angles are degrees
    • The files are names AB_Daten_D_C.csv
      • D: Datum of the trip (as YYYY_MM_DD)
      • A: Behavior (cruise, fun, overtake, traffic, or wait)
      • B: Number of the occurrence of this behavior during the trip
      • C: Number of the segment within the occurrence
    • The files are grouped by folders named after the corresponding behavior
    • The IMU used to collect the data was a XSENS MTi

  16. d

    Enhanced Historical Land-Use and Land-Cover Data Sets of the U.S. Geological...

    • datadiscoverystudio.org
    zip
    Updated Apr 1, 2007
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2007). Enhanced Historical Land-Use and Land-Cover Data Sets of the U.S. Geological Survey: Data Source Index Polygons [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/ddbfe61f844f4cc4b364521254abd1c5/html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2007
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Description

    Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information

  17. DL-Spectral Challenge data and information

    • zenodo.org
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emil Sidky; Emil Sidky (2024). DL-Spectral Challenge data and information [Dataset]. http://doi.org/10.5281/zenodo.13882326
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emil Sidky; Emil Sidky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains the materials for the DL-sparse-view CT Challenge

    ----------------------------------------------------------------------------------------
    CONTENTS of data/
    ----------------------------------------------------------------------------------------

    The training data for developing the neural networks is in the subfolder "data".
    All files are compressed with gzip in order to facilate faster downloads.
    Data are partitioned into four batches, which also facilates downloading of the
    individual files. Data are in python numpy's .npy format.
    After uncompressing with gunzip the .npy files can be read into python
    with the numpy.load command, yielding single precision floating point arrays
    of the proper dimensions.

    In the "data" folder are:
    Phantom_batch?.npy
    These arrays are 1000x512x512.
    1000 images of pixel dimensions 512x512.
    These are the true images.

    FBP128_batch?.npy
    These arrays are 1000x512x512.
    1000 images of pixel dimensions 512x512.
    These are the FBP reconstructed images from the 128-view sinograms.

    Sinogram_batch?.npy
    These arrays are 1000x128x1024.
    1000 sinograms of 128 projections over 360 degree scanning onto a 1024-pixel linear detector.

    There are four batches. Thus 4000 sets of data/image pairs are available for training
    the neural networks for image reconstruction.
    The goal is to train a network that accepts the FBP128 image (and/or the 128-view sinogram)
    to yield an image that is as close as possible to the corresponding Phantom image.

    ----------------------------------------------------------------------------------------
    CONTENTS of validation-data/
    ----------------------------------------------------------------------------------------
    Data is in the same arrangement as in the "data/" folder except that there are only 10 cases.
    As a result they data are not split into batches and they are not compressed.
    Phantom_validation.npy
    These arrays are 10x512x512.
    10 images of pixel dimensions 512x512.
    These are the true images for the validation stage. !!!! KEEP THIS SECRET !!!!

    FBP128_validation.npy
    These arrays are 10x512x512.
    10 images of pixel dimensions 512x512.
    These are the FBP reconstructed images from the 128-view sinograms.

    Sinogram_validation.npy
    These arrays are 10x128x1024.
    10 sinograms of 128 projections over 360 degree scanning onto a 1024-pixel linear detector.


    ----------------------------------------------------------------------------------------
    Contents of this folder
    ----------------------------------------------------------------------------------------
    "smallbatch" data
    metrics.py
    converMatlab.py
    README (you're reading this now)

    A "smallbatch" set of data is in this folder, containing only 10 phantoms, fbp images, and sinograms.

    These data files are for viewing and are used to demonstrate the
    metrics that will be used to evaluate the submitted images for this Grand Challenge.
    Running the program metrics.py will compare the FBP128 images against the ground truth (Phantom images).
    Hopefully your network will yield images that have lower RMSEs!
    The two metrics are mean image RMSE, and worst-case ROI RMSE for a 25x25 pixel ROI.
    The formulas for these metrics are in [put appropriate url link here],
    and the metrics.py code can also be inspected to see how the calculation is performed.

    The contest data are in numpy's .npy format and test image submission should also use
    this format. For matlab users, a script "convertMatlab.py" is included that shows how
    to convert the "smallbatch" data to matlab's .mat format. Also, converting back to .npy
    is shown in this script.

  18. Z

    Data associated with "A collaborative filtering based approach to biomedical...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lever, Jake (2020). Data associated with "A collaborative filtering based approach to biomedical knowledge discovery" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1227312
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Lever, Jake
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data set associated with the publication: "A collaborative filtering based approach to biomedical knowledge discovery" published in Bioinformatics.

    The data are sets of cooccurrences of biomedical terms extracted from published abstracts and full text articles. The cooccurrences are then represented in sparse matrix form. There are three different splits of this data denoted by the prefix number on the files.

    1. All - All cooccurrences combined in a single file

    2. Training/Validation - All cooccurrences in publications before 2010 in training, all novel cooccurrences in publication in 2010 go in validation

    3. Training+Validation/Test - All cooccurrences in publication upto and including 2010 in training+validation. All novel cooccurrences after 2010 in year by year increments and also all combined together

    Furthermore there are subset files which are used in some experiments to deal with the computational cost of evaluating the full set. The associated cuids.txt file containing a link between the row/column in the matrix with the UMLS Metathesaurus CUIDs. Hence the first row of cuids.txt matches up to the 0th row/column in the matrix. Note that the matrix is square and symmetric. This work was done with UMLS Metathesaurus 2016AB.

  19. Z

    Data from: WaveFake: A data set to facilitate audio DeepFake detection

    • data.niaid.nih.gov
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schönherr, Lea (2024). WaveFake: A data set to facilitate audio DeepFake detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4904578
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    Schönherr, Lea
    Frank, Joel
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The main purpose of this data set is to facilitate research into audio DeepFakes. We hope that this work helps in finding new detection methods to prevent such attempts. These generated media files have been increasingly used to commit impersonation attempts or online harassment.

    The data set consists of 104,885 generated audio clips (16-bit PCM wav). We examine multiple networks trained on two reference data sets. First, the LJSpeech data set consisting of 13,100 short audio clips (on average 6 seconds each; roughly 24 hours total) read by a female speaker. It features passages from 7 non-fiction books and the audio was recorded on a MacBook Pro microphone. Second, we include samples based on the JSUT data set, specifically, basic5000 corpus. This corpus consists of 5,000 sentences covering all basic kanji of the Japanese language (4.8 seconds on average; roughly 6.7 hours total). The recordings were performed by a female native Japanese speaker in an anechoic room. Finally, we include samples from a full text-to-speech pipeline (16,283 phrases; 3.8s on average; roughly 17.5 hours total). Thus, our data set consists of approximately 175 hours of generated audio files in total. Note that we do not redistribute the reference data.

    We included a range of architectures in our data set:

    MelGAN

    Parallel WaveGAN

    Multi-Band MelGAN

    Full-Band MelGAN

    WaveGlow

    Additionally, we examined a bigger version of MelGAN and include samples from a full TTS-pipeline consisting of a conformer and parallel WaveGAN model.

    Collection Process

    For WaveGlow, we utilize the official implementation (commit 8afb643) in conjunction with the official pre-trained network on PyTorch Hub. We use a popular implementation available on GitHub (commit 12c677e) for the remaining networks. The repository also offers pre-trained models. We used the pre-trained networks to generate samples that are similar to their respective training distributions, LJ Speech and JSUT. When sampling the data set, we first extract Mel spectrograms from the original audio files, using the pre-processing scripts of the corresponding repositories. We then feed these Mel spectrograms to the respective models to obtain the data set. For sampling the full TTS results, we use the ESPnet project. To make sure the generated phrases do not overlap with the training set, we downloaded the common voices data set and extracted 16.285 phrases from it.

    This data set is licensed with a CC-BY-SA 4.0 license.

    This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -- EXC-2092 CaSa -- 390781972.

  20. H

    Replication Data for "News from the Other Side: How Topic Relevance Limits...

    • dataverse.harvard.edu
    • dataone.org
    Updated May 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Mummolo (2016). Replication Data for "News from the Other Side: How Topic Relevance Limits the Prevalence of Partisan Selective Exposure" [Dataset]. http://doi.org/10.7910/DVN/HQKQCQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Jonathan Mummolo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jun 11, 2014 - Jun 16, 2014
    Description

    Included are survey data sets and .R script files necessary to replicate all tables and figures. Tables will display in the R console. Figures will save as .pdf files ot your working directory. Instructions for Replication: These materials will allow for replication in R. You can download data files in .R or .tab format. Save all files in a common folder (directory). Open the .R script file named “jop_replication_dataverse2.R” and change the working directory at the top of the script to the directory where you saved the replication materials. Execute the code in this script file to generate all tables and figures displayed in the manuscript. The script is annotated. Take care to execute the appropriate lines when loading data sets depending on whether you downloaded the data in .R or .tab format (the script is written to accommodate both formats). Note: the files "results.diff_rep.Rdata" and "results.diff2.Rdata" are R list objects and can only be opened in R. Should you encounter any problems or have any questions, please contact the author at jmummolo@stanford.edu.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GESIS - Data Archive for the Social Sciences (2019). Downloadstatistik GESIS Datenarchiv [Dataset]. http://doi.org/10.4232/1.13222

Downloadstatistik GESIS Datenarchiv

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
application/x-spss-sav(2154811), application/x-stata-dta(5384365), (2139418), application/x-spss-sav(2295631), (2051697)Available download formats
Dataset updated
Feb 14, 2019
Dataset provided by
GESIS Data Archive
GESIS search
Authors
GESIS - Data Archive for the Social Sciences
License

https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

Time period covered
Jan 1, 2004 - Dec 31, 2018
Variables measured
za_nr - Archive study number, doi - Digital Object Identifier, version - GESIS Archive Version, Access - Access category (0, A, B, C, D, E), Title - English study title (if n.a., German title), Title_DE - German study title (if n.a., English title), Total - All downloads combined (all years, all sources), d_2004_dbk - All DBK downloads from that respective year, d_2005_dbk - All DBK downloads from that respective year, d_2006_dbk - All DBK downloads from that respective year, and 63 more
Description

General information: The data sets contain information on how often materials of studies available through GESIS: Data Archive for the Social Sciences were downloaded and/or ordered through one of the archive´s plattforms/services between 2004 and 2018.

Sources and plattforms: Study materials are accessible through various GESIS plattforms and services: Data Catalogue (DBK), histat, datorium, data service (and others).

Years available: - Data Catalogue: 2012-2018 - data service: 2006-2018 - datorium: 2014-2018 - histat: 2004-2018

Data sets: Data set ZA6899_Datasets_only_all_sources contains information on how often data files such as those with dta- (Stata) or sav- (SPSS) extension have been downloaded. Identification of data files is handled semi-automatically (depending on the plattform/serice). Multiple downloads of one file by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

Data set ZA6899_Doc_and_Data_all_sources contains information on how often study materials have been downloaded. Multiple downloads of any file of the same study by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

Both data sets are available in three formats: csv (quoted, semicolon-separated), dta (Stata v13, labeled) and sav (SPSS, labeled). All formats contain identical information.

Variables: Variables/columns in both data sets are identical. za_nr ´Archive study number´ version ´GESIS Archiv Version´ doi ´Digital Object Identifier´ StudyNo ´Study number of respective study´ Title ´English study title´ Title_DE ´German study title´ Access ´Access category (0, A, B, C, D, E)´ PubYear ´Publication year of last version of the study´ inZACAT ´Study is currently also available via ZACAT´ inHISTAT ´Study is currently also available via HISTAT´ inDownloads ´There are currently data files available for download for this study in DBK or datorium´ Total ´All downloads combined´ downloads_2004 ´downloads/orders from all sources combined in 2004´ [up to ...] downloads_2018 ´downloads/orders from all sources combined in 2018´ d_2004_dbk ´downloads from source dbk in 2004´ [up to ...] d_2018_dbk ´downloads from source dbk in 2018´ d_2004_histat ´downloads from source histat in 2004´ [up to ...] d_2018_histat ´downloads from source histat in 2018´ d_2004_dataservice ´downloads/orders from source dataservice in 2004´ [up to ...] d_2018_dataservice ´downloads/orders from source dataservice in 2018´

More information is available within the codebook.

Search
Clear search
Close search
Google apps
Main menu