15 datasets found

Supplement 1. R code for estimating thresholds while accounting for variable...

wiley.figshare.com

html

Updated Jun 2, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Jay E. Jones; Andrew J. Kroll; Jack Giovanini; Steven D. Duke; Matthew G. Betts (2023). Supplement 1. R code for estimating thresholds while accounting for variable detection and data for estimating thresholds for forest birds, Oregon, USA, 2007–2008. [Dataset]. http://doi.org/10.6084/m9.figshare.3552231.v1

Explore at:

htmlAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3552231.v1

Dataset updated

Jun 2, 2023

Dataset provided by

Wiley

Authors

Jay E. Jones; Andrew J. Kroll; Jack Giovanini; Steven D. Duke; Matthew G. Betts

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

United States, Oregon

Description

File List Supplement_Avian data.csv Supplement_R code.r Description The Supplement_Avian data.csv file contains data on stand-level habitat covariates and visit-specific detections of avian species, Oregon, USA, 2008–2009. Column definitions

    Stand id
    Percent cover of conifer species
    Percent cover of broadleaf species
    Percent cover of deciduous broadleaf species
    Percent cover of hardwood species
    Percent cover of hardwood species in a 2000 m radius circle around each sample stand
    Elevation (m) of stand
    Age of stand
    Year of sampling
    Visit number
    Detection of Magnolia Warbler on Visit 1
    Detection of Magnolia Warbler on Visit 2
    Detection of Orange-crowned Warbler on Visit 1
    Detection of Orange-crowned Warbler on Visit 2
    Detection of Swainson’s Thrush on Visit 1
    Detection of Swainson’s Thrush on Visit 2
    Detection of Willow Flycatcher on Visit 1
    Detection of Willow Flycatcher on Visit 2
    Detection of Wilson’s Warbler on Visit 1
    Detection of Wilson’s Warbler on Visit 1

  Checksum values are:

    Column 2 (Percent cover of conifer species – CONIFER): SUM = 5862.83
    Column 3 (Percent cover of broadleaf species – BROAD): SUM = 7043.17
    Column 4 (Percent cover of deciduous broadleaf species – DECBROAD): SUM = 5475.17
    Column 5 (Percent cover of hardwood species – HARDWOOD): SUM = 2151.96
    Column 6 (Percent cover of hardwood species in a 2000 m radius circle around each sample stand– HWD2000): SUM = 3486.07
    Column 7 (Stand elevation – ELEVM): SUM = 83240.58
    Column 8 (Stand age – AGE): SUM = 1537; NA indicates a stand was harvested in 2008
    Column 9 (Year of sampling – YEAR): SUM = 425792
    Column 11 (MGWA.1): SUM = 70
    Column 12 (MGWA.2): SUM = 71
    Column 13 (OCWA.1): SUM = 121
    Column 14 (OCWA.2): SUM = 76
    Column 15 (SWTH.1): SUM = 90
    Column 16 (SWTH.2): SUM = 95
    Column 17 (WIFL.1): SUM = 85
    Column 18 (WIFL.2): SUM = 85
    Column 19 (WIWA.1): SUM = 36
    Column 20 (WIWA.2): SUM = 37

  The Supplement_R code.r file is R source code for simulation and empirical analyses conducted in Jones et al.

o
Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2017
openicpsr.org
Updated May 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2017 [Dataset]. http://doi.org/10.3886/E103500V4
Explore at:
Unique identifier
https://doi.org/10.3886/E103500V4
Dataset updated
May 18, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1992 - 2017
Area covered
United States
Description
Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. The data sets here combine all data from the years 1992-2015 into a single file. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data. The zip file contains the data in the following formats and a codebook: .dta - Stata.rda - RIf you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
o
Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2015
openicpsr.org
Updated May 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2015 [Dataset]. http://doi.org/10.3886/E103500V1
Explore at:
Unique identifier
https://doi.org/10.3886/E103500V1
Dataset updated
May 18, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1992 - 2015
Area covered
United States
Description
The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. The data sets here combine all data from the years 1992-2015 into a single file. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data. The zip file contains the data in the following formats and a codebook: .csv - Microsoft Excel.dta - Stata.sav - SPSS.rda - RIf you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Z
Data and scripts for the analysis of the influence of crop pollinator...
data.niaid.nih.gov
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gleiser, Gabriela (2023). Data and scripts for the analysis of the influence of crop pollinator dependence and growth form on yield decline [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7863824
Explore at:
Dataset updated
Aug 8, 2023
Dataset provided by
Gleiser, Gabriela
Aizen, Marcelo Adrián
Kitzberger, Thomas
Milla, Rubén
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Marcelo A. Aizen, Gabriela R. Gleiser, Thomas Kitzberger, Ruben Milla. Being a tree crop increases the odds of experiencing yield declines irrespective of pollinator dependence (to be submitted to PCI)

Data and R scripts to reproduce the analyses and the figures shown in the paper. All analyses were performed using R 4.0.2.

Data

FAOdata_21-12-2021.csv

This file includes yearly data (1961-2020, column 8) on yield and cultivated area (columns 6 and 10) at the country, sub-regional, and regional levels (column 2) for each crop (column 4) drawn from the United Nations Food and Agriculture Organization database (data available at http://www.fao.org/faostat/en; accessed July 21-12-2021). [Used in Script 1 to generate the synthesis dataset]

countries.csv

This file provides information on the region (column 2) to which each country (column 1) belongs. [Used in Script 1 to generate the synthesis dataset]

dependence.csv

This file provides information on the pollinator dependence category (column 2) of each crop (column 1).

traits.csv

This file provides information on the traits of each crop other than pollinator dependence, including, besides the crop name (column1), the variables type of harvested organ (column 5) and growth form (column 6). [Used in Script 1 to generate the synthesis dataset]

dataset.csv

The synthesis dataset generated by Script 1.

growth.csv

The yield growth dataset generated by Script 1 and used as input by Scripts 2 and 3.

phylonames.csv

This file lists all the crops (column 1) and their equivalent tip names in the crop phylogeny (column 2). [Used in Script 2 for the phylogenetically-controlled analyses]

8.phylo137.tre

File containing the phylogenetic tree.

Scripts

dataset

This R script curates and merges all the individual datasets mentioned above into a single dataset, estimating and adding to this single dataset the growth rate for each crop and country, and the (log) cumulative harvested area per crop and country over the period 1961-2020.

analyses

This R script includes all the analyses described in the article’s main text.

figures

This R script creates all the main and supplementary figures of this article.

lme4_phylo_setup

R function written by Li and Bolker (2019) to carry out phylogenetically-controlled generalized linear mixed-effects models as described in the main text of the article.

References

Li, M., and B. Bolker. 2019. wzmli/phyloglmm: First release of phylogenetic comparative analysis in lme4- verse. Zenodo. https://doi.org/10.5281/zenodo.2639887.
An Index to Roscher's Lexicon of Mythology
zenodo.org
txt
Updated Aug 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Groß; Jonathan Groß (2024). An Index to Roscher's Lexicon of Mythology [Dataset]. http://doi.org/10.5281/zenodo.13294014
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13294014
Dataset updated
Aug 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jonathan Groß; Jonathan Groß
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
May 7, 2024
Description
compiled by Jonathan Groß
ORCID 0000-0002-2564-9530
jgross85 [AT] gmail [DOT] com

1. Introduction

1.1. General Disclaimer

This index file was created as a private research project with the goal to make the wealth of information in Wilhelm Heinrich Roscher's "Detailed Lexicon of Greek and Roman Mythology" (Ausführliches Lexikon der griechischen und römischen Mythologie) more accessible to everybody.

Roscher's Lexicon, originally published by B. G. Teubner in Leipzig from 1884 to 1937, is the most complete resource on Greek and Roman mythological names to date and also encompasses mythological (and religious) subjects from Sumeran, Akkadian, Babylonian, Hittite, Egyptian, Celtic, Germanic and other neighbouring cultures.

The Lexicon was reprinted three times in the latter half of the 20th century (by Georg Olms in Hildesheim), even after its pictorial content had been superseded by the Lexicon Iconographicum Mythologiae Classicae (1981–1999, 2009). Unfortunately, since the last reprint of 1992/1993, Roscher's Lexicon has been out of stock at both publishers (Olms and De Gruyter Brill).

Since the late 2000s, Roscher's Lexicon (the 6 main volumes and 4 supplements) was digitised by initiatives such as Google Books and the Internet Archive, and its contents can now be viewed there (with OCR text). One prominent use case of these scans is the German Wikipedia, where more than 2,500 pages use Roscher's Lexicon as a reference with a link to a scanned page in the Internet Archive.

1.2. Licensing

This dataset is released under the CC0 1.0 Universal License (https://creativecommons.org/publicdomain/zero/1.0/deed.en). I chose this license in order to maximise the usefulness of the data to everybody.

Use and reuse of this data is strongly encouraged, and one use case has already been initiated by the author:
-https://mythogram.wikibase.cloud/wiki/Project:Roscher%27s_Lexicon_of_Mythology (presentation of the information from the index file as Linked Open Data, finished on 28 May 2024 with emendations until 5 August 2024)

Although not technically required by the licensing agreement, the author would appreciate being informed about other uses of the data.

The contents of the Lexicon themselves are mostly in the Public Domain as of 2024. Additionally, many of the smaller entries do not reach the threshold of originality. This includes most of the cover addenda.

2. Description of Data

The index file is formatted as tabular data. This file was created with LibreOffice Calc (originally in 7.6.6.3, in LibreOffice Calc 24.2 as of version 1.1 of this file) and is stored in its native .ods format. For convenience, an .xlsx version is also provided. Both files are practically identical, but the .ods file is to be regarded as the 'original'.

Data is stored in several tabs:
(A) 'main alphabet' with the headwords of the main work (excluding addenda and corrigenda from the covers; for these see below).
(B) 'cover addenda' with the additional entries
(C) 'authors' with information on the authors
(D) 'fascicles' with information on the individual issues of the Lexicon

The Tabs are available separately as .csv files (with tab separation, so strictly speaking it should be .tsv).

Tabs A and B are almost identical in structure, with the columns:
A id = unique entry ID (not authoritative, just a means to identify individual entries)
B headword = lemma of the entry as stated by the Lexicon
C subject_type = classification scheme for the subject matter of the article (again, not authoritative and in places even contentious)
D vol = volume number
E fascicle = issue number (not found in most exemplars, assigned according to my own research)
F date = publication date of the entry (inferred from the issue date of the fascicle)
G–H col1,2 = start and end column
I colspan = span of columns
J–M author1,2,3,4 = author of the entry (please refer to Tab C, column A)
N entry_type = classification of entry (article, cross-reference, addendum, correction)
O scan = URL to a scan of the start column in the Internet Archive
P Wikidata = ID of the Wikidata item representing the subject (incomplete as of Version 1.1)
Q FactGrid = ID of the FactGrid item representing the subject (mostly missing as of Version 1.1)
R Mythogram = ID of the (bibliographic) Mythogram item representing the Lexicon entry
S redirect_target = target headword as stated, if the entry is a cross-reference
T remarks = remarks on the entry or subject (such as 'non-entity', 'duplicate', 'double lemma')
U PD = if the entry is in the Public Domain (either 'yes' or year where it enters the PD)

Tab B has two additional columns, which are mostly empty as of version 1.1
V referring_to = target entry (in the main alphabet) of the correction or addenda
W excerpt = textual excerpt from the entry

Tab C has information on the authors:
A short_name = for sorting reasons
B full_name = full name
C Wikidata = Wikidata item
D FactGrid = FactGrid item
E Mythogram = Mythogram item
F yob = year of birth
G yod = year of death
H vols = volumes contributed to
I article_count = number of articles written (not counting corrections and addenda from Tab B)
J–L namestring1,2,3 = name as written in the Lexicon
M remarks = remarks on completeness and certainty of data

Tab D informs about the individual fascicles of the Lexicon as they appeared from 1884 to 1937:
A no. = fascicle number
B vol = volume(s) the fascicle belongs to
C colspan = column span of the fascicle
D headwords = headwords contained in the fascicle as advertised on the cover page
E issue_date = date of publication of the fascicle as stated on the cover page
F quires = quire numbers of the fascicle
G quire_count = quire count of the fascicle (calculated from column numbers: in some cases, at the end of a volume, quires were shortened, returning rational numbers here)
H remarks = remarks (in German)

3. Version History and Change Log

--------------------------------------------------
Version 1.1 (August 11th, 2024)
-Tab A, column P (Wikidata Q-ids): added 2,380 out of 15,489 = 15.4%)
-Tab A, column R (Mythogram Q-ids): completed
-Tab C: added data for author Wilhelm Windisch (translated Cumont's article on 'Mithras')
-minor corrections to some entries (typos)
-volume number changed from 3.2 to 3.1 for 125 entries (Pasikrateia–Peirithoos)
-fascicle number changed from 104/105 to 106/107 for the last 12 entries (Tameobrigus–Kerberos [Nachtrag])
-addition of 11 missed entries, values in column A renumbered accordingly

--------------------------------------------------
Version 1.0 (May 4th, 2024)
-Tabs A–B with complete and checked data for columns A–N and S
-Tab A with complete data for column O
-Tabs C and D with complete data

--------------------------------------------------
Prior to publication:

-collection and checking of data (roughly 376 hours of work, started in July 2023 and finished on Star Wars Day 2024)
Tennessee Eastman Process Simulation Dataset
kaggle.com
zip
Updated Feb 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergei Averkiev (2020). Tennessee Eastman Process Simulation Dataset [Dataset]. https://www.kaggle.com/averkij/tennessee-eastman-process-simulation-dataset
Explore at:
zip(1370814903 bytes)Available download formats
Dataset updated
Feb 9, 2020
Authors
Sergei Averkiev
Description
Intro

This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017.

Content

Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files.

Each dataframe contains 55 columns:

Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions).

Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping).

Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively.

Columns 4 to 55 contain the process variables; the column names retain the original variable names.

Acknowledgements

This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.

User Agreement

By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms.

The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission.

In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights.

Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law.

When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work.

This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website.
o
Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2016
openicpsr.org
datasearch.gesis.org
Updated May 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2016 [Dataset]. http://doi.org/10.3886/E103500V3
Explore at:
Unique identifier
https://doi.org/10.3886/E103500V3
Dataset updated
May 18, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1992 - 2015
Area covered
United States
Description
Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. The data sets here combine all data from the years 1992-2015 into a single file. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data. The zip file contains the data in the following formats and a codebook: .csv - Microsoft Excel.dta - Stata.sav - SPSS.rda - RIf you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
H
Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...
dataverse.harvard.edu
dataone.org
Updated Jul 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2017). Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation [Dataset]. http://doi.org/10.7910/DVN/6C3JR1
Explore at:
application/x-rlang-transport(24678017)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/6C3JR1
Dataset updated
Jul 6, 2017
Dataset provided by
Harvard Dataverse
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
Description
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
E
[PANDORA Total column ozone] - Total column ozone (O3) measured measured...
erddap.bco-dmo.org
Updated Aug 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BCO-DMO (2019). [PANDORA Total column ozone] - Total column ozone (O3) measured measured during DANCE cruise HRS1414 aboard the R/V Hugh R. Sharp from July to August 2014. (Collaborative Research: Impacts of atmospheric nitrogen deposition on the biogeochemistry of oligotrophic coastal waters) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_732116/index.html
Explore at:
Dataset updated
Aug 16, 2019
Dataset provided by
Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
Authors
BCO-DMO
License
https://www.bco-dmo.org/dataset/732116/licensehttps://www.bco-dmo.org/dataset/732116/license
Area covered

Variables measured
Temp, Zenith, Azimuth, O3_temp, O3_vert, latitude, Date_Time, O3_uncert, Samp_time, err_index, and 10 more
Description
Total column ozone (O3) measured measured during DANCE cruise HRS1414 aboard the R/V Hugh R. Sharp from July to August 2014. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description="" awards_0_award_nid=726327 awards_0_award_number=OCE-1260574 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1260574 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=Henrietta N Edmonds awards_0_program_manager_nid=51517 cdm_data_type=Other comment=PANDORA Total column ozone PI: R. Najjar (Penn State) Dataset ID: 732116 Version: 1 Last updated: 2018-04-18 Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/bco-dmo.732116.1 Easternmost_Easting=-71.1105 geospatial_lat_max=38.8218 geospatial_lat_min=34.2441 geospatial_lat_units=degrees_north geospatial_lon_max=-71.1105 geospatial_lon_min=-75.1543 geospatial_lon_units=degrees_east infoUrl=https://www.bco-dmo.org/dataset/732116 institution=BCO-DMO instruments_0_acronym=Radiometer instruments_0_dataset_instrument_nid=732128 instruments_0_description=Radiometer is a generic term for a range of instruments used to measure electromagnetic radiation (radiance and irradiance) in the atmosphere or the water column. For example, this instrument category includes free-fall spectral radiometer (SPMR/SMSR System, Satlantic, Inc), profiling or deck cosine PAR units (PUV-500 and 510, Biospherical Instruments, Inc). This is a generic term used when specific type, make and model were not specified. instruments_0_instrument_external_identifier=https://vocab.nerc.ac.uk/collection/L05/current/122/ instruments_0_instrument_name=Radiometer instruments_0_instrument_nid=442 instruments_0_supplied_name=Metcon 2-pi radiometer instruments_1_acronym=Eppley PSP instruments_1_dataset_instrument_nid=732124 instruments_1_description=This radiometer measures sun and sky irradiance in the range of wavelengths 0.285 to 2.8 microns, including most of the solar spectrum. The PSP is intended to weight the energy flux in all wavelengths equally. It is a "hemispheric receiver" intended to approximate the cosine response for oblique rays. The Eppley Precision Spectral Pyranometer (PSP) is primarily used where high accuracy is required or where it is used to calibrate other pyranometers. The PSP outputs a low level voltage ranging from 0 to a maximum of about 12mV depending on sensor calibration and radiation level. An instruction manual provided by Eppley contains the sensor calibration constant and serial number. The Precision Spectral Pyranometer is a World Meteorological Organization First Class Radiometer and comes with a calibration certificate traceable to the World Radiation Reference and a temperature compensation curve. More information is available from Eppley Labs. instruments_1_instrument_external_identifier=https://vocab.nerc.ac.uk/collection/L05/current/112/ instruments_1_instrument_name=Precision Spectral Pyranometer instruments_1_instrument_nid=483 instruments_1_supplied_name=Eppley PSP instruments_2_acronym=Eppley PIR instruments_2_dataset_instrument_nid=732126 instruments_2_description=The Eppley Precision Infrared Radiometer (PIR) pyrgeometer measures longwave (infrared) radiation. It is housed in a weatherproof titanium canister that has been painted with a very flat black paint that absorbs radiation. A small glass dome at the top of the instrument is covered with an 'interference coating' which allows only infrared radiation to come through. Light levels are detected as temperature changes creating voltages in fine wire coil detectors. more from Eppley Labs instruments_2_instrument_external_identifier=https://vocab.nerc.ac.uk/collection/L22/current/TOOL0083/ instruments_2_instrument_name=Eppley Longwave Radiometer instruments_2_instrument_nid=517 instruments_2_supplied_name=Eppley PIR instruments_3_acronym=PrecipGauge instruments_3_dataset_instrument_nid=732127 instruments_3_description=measures rain or snow precipitation instruments_3_instrument_external_identifier=https://vocab.nerc.ac.uk/collection/L05/current/381/ instruments_3_instrument_name=Precipitation Gauge instruments_3_instrument_nid=671 instruments_3_supplied_name=R.M. Young 50202 instruments_4_acronym=Spectrophotometer instruments_4_dataset_instrument_nid=732130 instruments_4_description=An instrument used to measure the relative absorption of electromagnetic radiation of different wavelengths in the near infra-red, visible and ultraviolet wavebands by samples. instruments_4_instrument_external_identifier=https://vocab.nerc.ac.uk/collection/L05/current/LAB20/ instruments_4_instrument_name=Spectrophotometer instruments_4_instrument_nid=707 instruments_4_supplied_name=Aerodyne CAPS; Thermo 49C; Thermo 48; Thermo 42C metadata_source=https://www.bco-dmo.org/api/dataset/732116 Northernmost_Northing=38.8218 param_mapping={'732116': {'lat': 'flag - latitude', 'lon': 'flag - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/732116/parameters people_0_affiliation=Pennsylvania State University people_0_affiliation_acronym=PSU people_0_person_name=Dr Raymond Najjar people_0_person_nid=50813 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=Pennsylvania State University people_1_affiliation_acronym=PSU people_1_person_name=Douglas K. Martins people_1_person_nid=734294 people_1_role=Scientist people_1_role_type=originator people_2_affiliation=Woods Hole Oceanographic Institution people_2_affiliation_acronym=WHOI BCO-DMO people_2_person_name=Megan Switzer people_2_person_nid=708683 people_2_role=BCO-DMO Data Manager people_2_role_type=related project=DANCE projects_0_acronym=DANCE projects_0_description=NSF abstract: Deposition of atmospheric nitrogen provides reactive nitrogen species that influence primary production in nitrogen-limited regions. Although it is generally assumed that these species in precipitation contributes substantially to anthropogenic nitrogen loadings in many coastal marine systems, its biological impact remains poorly understood. Scientists from Pennsylvania State University, William & Mary College, and Old Dominion University will carry out a process-oriented field and modeling effort to test the hypothesis that deposits of wet atmospheric nitrogen (i.e., precipitation) stimulate primary productivity and accumulation of algal biomass in coastal waters following summer storms and this effect exceeds the associated biogeochemical responses to wind-induced mixing and increased stratification caused by surface freshening in oligotrophic coastal waters of the eastern United States. To attain their goal, the researchers would perform a Lagrangian field experiment during the summer months in coastal waters located between Delaware Bay and the coastal Carolinas to determine the response of surface-layer biogeochemistry and biology to precipitation events, which will be identified and intercepted using radar and satellite data. As regards the modeling effort, a 1-D upper ocean mixing model and a 1-D biogeochemical upper-ocean will be calibrated by assimilating the field data obtained a part of the study using the adjoint method. The hypothesis will be tested using sensitivity studies with the calibrated model combined with in-situ data and results from the incubation experiments. Lastly, to provide regional and historical context for the field measurements and the associated 1-D modeling, linked regional atmospheric-oceanic biogeochemical modeling will be conducted. Broader Impacts. Results from the study would be incorporated into class lectures for graduate courses on marine policy and marine biogeochemistry. One graduate student from Pennsylvania State University, one graduate student from the College of William and Mary, and one graduate and one undergraduate student from Old Dominion University would be supported and trained as part of this project. projects_0_end_date=2017-02 projects_0_geolocation=Offshore Mid-Atlantic Bight and northern South-Atlantic Bight between latitudes 31.60°N and 38.89°N, and longitudes 71.09°W and 75.16°W projects_0_name=Collaborative Research: Impacts of atmospheric nitrogen deposition on the biogeochemistry of oligotrophic coastal waters projects_0_project_nid=726328 projects_0_start_date=2013-03 sourceUrl=(local files) Southernmost_Northing=34.2441 standard_name_vocabulary=CF Standard Name Table v55 subsetVariables=Res_stray_light,Result_index,O3_temp,O3_temp_uncert version=1 Westernmost_Easting=-75.1543 xml_source=osprey2erddap.update_xml() v1.3
Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...
search.datacite.org
openicpsr.org
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/e102263v5-10021
Explore at:
Unique identifier
https://doi.org/10.3886/e102263v5-10021
Dataset updated
2018
Dataset provided by
DataCitehttps://www.datacite.org/
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Jacob Kaplan
Description
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.

As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:

Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
r
Inequality measures based on election data 1871 and 1892 for Swedish...
researchdata.se
demo.researchdata.se
Updated Apr 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Moricz (2019). Inequality measures based on election data 1871 and 1892 for Swedish municipalities [Dataset]. http://doi.org/10.5878/cw7b-g897
Explore at:
(429925)Available download formats
Unique identifier
https://doi.org/10.5878/cw7b-g897
Dataset updated
Apr 30, 2019
Dataset provided by
Lund University
Authors
Sara Moricz
Time period covered
1871
Area covered
Sweden
Description
The data contains inequality measures at the municipality-level for 1892 and 1871, as estimated in the PhD thesis "Institutions, Inequality and Societal Transformations" by Sara Moricz. The data also contains the source publications: 1) tabel 1 from “Bidrag till Sverige official statistik R) Valstatistik. XI. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1892” (biSOS R 1892) 2) tabel 1 from “Bidrag till Sverige official statistik R) Valstatistik. II. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1871” (biSOS R 1871)

moricz_inequality_agriculture.csv

A UTF-8 encoded .csv-file. Each row is a municipality of the agricultural sample (2222 in total). Each column is a variable.

R71muncipality_id: a unique identifier for the municipalities in the R1871 publication (the municipality name can be obtained from the source data) R92muncipality_id: a unique identifier for the municipalities in the R1892 publication (the municipality name can be obtained from the source data) agriTop1_1871: an ordinal measure (ranking) of the top 1 income share in the agricultural sector for 1871 agriTop1_1892: an ordinal measure (ranking) of the top 1 income share in the agricultural sector for 1892 highestFarm_1871: a cardinal measure of the top 1 person share in the agricultural sector for 1871 highestFarm_1871: a cardinal measure of the top 1 person share in the agricultural sector for 1892

moricz_inequality_industry.csv

A UTF-8 encoded .csv-file. Each row is a municipality of the industrial sample (1328 in total). Each column is a variable.

R71muncipality_id: see above description R92muncipality_id: see above description indTop1_1871: an ordinal measure (ranking) of the top 1 income share in the industrial sector for 1871 indTop1_1892: an ordinal measure (ranking) of the top 1 income share in the industrial sector for 1892

moricz_R1892_source_data.csv

A UTF-8 encoded .csv-file with the source data. The variables are described in the adherent codebook moricz_R1892_source_data_codebook.csv.

Contains table 1 from “Bidrag till Sverige official statistik R) Valstatistik. XI. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1892” (biSOS R 1892). SCB provides the scanned publication on their website. Dollar Typing Service typed and delivered the data in 2015. All numerical variables but two have been checked. This is easy to do since nearly all columns should sum up to another column. For “Folkmangd” (population) the numbers have been corrected against U1892. The highest estimate of errors in the variables is 0.005 percent (0.5 promille), calculated at cell level. The two numerical variables which have not been checked is “hogsta_fyrk_jo“ and “hogsta_fyrk_ov“, as this cannot much be compared internally in the data. According to my calculations as the worst case scenario, I have measurement errors of 0.0043 percent (0.43 promille) in those variables.

moricz_R1871_source_data.csv

A UTF-8 encoded .csv-file with the source data. The variables are described in the adherent codebook moricz_R1871_source_data_codebook.csv.

Contains table 1 from “Bidrag till Sverige official statistik R) Valstatistik. II. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1871” (biSOS R 1871). SCB provides the scanned publication on their website. Dollar Typing Service typed and delivered the data in 2015. The variables have been checked for accuracy, which is feasible since columns and rows should sum. The variables that most likely carry mistakes are “hogsta_fyrk_al” and “hogsta_fyrk_jo”.
g
Water column process rate measurements from water collected aboard the R/V...
data.griidc.org
search.dataone.org
Updated Jan 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph P. Montoya (2021). Water column process rate measurements from water collected aboard the R/V Endeavor cruise EN559 in the northern Gulf of Mexico from 2015-05-31 to 2015-06-02 [Dataset]. http://doi.org/10.7266/n7-eg90-tk58
Explore at:
Unique identifier
https://doi.org/10.7266/n7-eg90-tk58
Dataset updated
Jan 21, 2021
Dataset provided by
GRIIDC
Authors
Joseph P. Montoya
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered

Description
This dataset contains water column process rate measurements from water collected aboard the R/V Endeavor cruise EN559 in the northern Gulf of Mexico from 2015-05-31 to 2015-06-02. Samples were collected during cruise EN559 using a CTD-rosette. The objective was to determine water column process rates at ECOGIG seep and other study sites. The ship departed Gulfport, Mississippi on 2015-05-29 and collected samples at ECOGIG sites before returning to Gulfport on 2015-06-21. Water samples collected with the CTD-rosette were incubated under simulated in-situ conditions after addition of 15N2 and either 13C-bicarbonate or 13C -methane tracers for 24 (DIC label) or 48h (CH4 label). Experiments were terminated by gentle pressure filtration onto a 10 µm sieve and pre-combusted GF/F filters to collect the small and large size fractions of particles. Filters were dried, then pelletized in Sn (tin) capsules for isotopic analysis. N and C isotopic abundances were measured by continuous-flow isotope ratio mass spectrometry using a Micromass Optima IRMS interfaced to a CE NA2500 elemental analyzer. Rates were calculated using a mass balance approach (Montoya et al. 1996). The dataset also includes the date, depth, and locations (latitudes and longitudes) of the sample collection.
T
Total Aflatoxin Immunoaffinity Columns (IAC) Report
marketreportanalytics.com
doc, pdf, ppt
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Total Aflatoxin Immunoaffinity Columns (IAC) Report [Dataset]. https://www.marketreportanalytics.com/reports/total-aflatoxin-immunoaffinity-columns-iac-114837
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global market for Total Aflatoxin Immunoaffinity Columns (IAC) is experiencing robust growth, driven by increasing concerns over food safety and stringent regulatory frameworks mandating aflatoxin detection. The market, estimated at $250 million in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 7% between 2025 and 2033, reaching approximately $450 million by 2033. This growth is fueled by several factors, including the rising prevalence of aflatoxin contamination in agricultural products, particularly in developing economies with less robust food safety infrastructure. The increasing demand for rapid and accurate aflatoxin detection methods, coupled with the simplicity and cost-effectiveness of IACs compared to other techniques such as HPLC, are key drivers for market expansion. Furthermore, advancements in IAC technology, such as the development of improved antibody selectivity and higher sample throughput, are contributing to increased adoption across various sectors, including food processing, agriculture, and research institutions. The market is segmented based on column type, application, and region, with North America and Europe currently holding significant market share due to established regulatory frameworks and advanced food safety practices. However, rapidly growing economies in Asia-Pacific and Latin America represent significant growth opportunities, driven by rising food consumption and increasing awareness of foodborne illnesses. Competitive landscape analysis reveals a diverse range of players, including both established companies like Neogen, PerkinElmer, and R-Biopharm AG, and emerging regional players. The market exhibits a mix of large multinational corporations and smaller specialized companies, creating a dynamic and competitive environment. Companies are focusing on product innovation, strategic partnerships, and geographical expansion to gain market share. The adoption of IACs is anticipated to increase further with the implementation of stricter regulations and evolving consumer preferences toward safer and higher-quality food products. This suggests a positive outlook for the Total Aflatoxin Immunoaffinity Columns market with continued growth and development anticipated over the next decade.
f
Data and tools for studying isograms
figshare.com
Updated Jul 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5245810.v1
Dataset updated
Jul 31, 2017
Dataset provided by
figshare
Authors
Florian Breit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

Label Data type Description

isogramy int The order of isogramy, e.g. "2" is a second order isogram

length int The length of the word in letters

word text The actual word/isogram in ASCII

source_pos text The Part of Speech tag from the original corpus

count int Token count (total number of occurences)

vol_count int Volume count (number of different sources which contain the word)

count_per_million int Token count per million words

vol_count_as_percent int Volume count as percentage of the total number of volumes

is_palindrome bool Whether the word is a palindrome (1) or not (0)

is_tautonym bool Whether the word is a tautonym (1) or not (0)

The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

Label

Data type

Description

!total_1grams

int

The total number of words in the corpus

!total_volumes

int

The total number of volumes (individual sources) in the corpus

!total_isograms

int

The total number of isograms found in the corpus (before compacting)

!total_palindromes

int

How many of the isograms found are palindromes

!total_tautonyms

int

How many of the isograms found are tautonyms

The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
b
Summary deployment data for MOCNESS 1m2 and 10m2 tows from R/V Kilo Moana...
datacart.bco-dmo.org
csv
Updated Jan 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey C. Drazen; Hilary G. Close; Cecelia Hannides; Brian N. Popp; Kanesa Seraphin (2016). Summary deployment data for MOCNESS 1m2 and 10m2 tows from R/V Kilo Moana KM1407, KM1418, KM1506 in the Central North Pacific, Station ALOHA from 2014-2015 (SuspendSinkPart project) [Dataset]. https://datacart.bco-dmo.org/dataset/636602
Explore at:
csv(83.61 KB)Available download formats
Dataset updated
Jan 27, 2016
Dataset provided by
Biological and Chemical Data Management Office
Authors
Jeffrey C. Drazen; Hilary G. Close; Cecelia Hannides; Brian N. Popp; Kanesa Seraphin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 19, 2014 - May 11, 2015
Area covered

Variables measured
lat, lon, net, sal, tow, temp, year, event, fluor, sigma, and 14 more
Measurement technique
MOCNESS10, MOCNESS1
Description
Summary data from 1m2 and 10m2 MOCNESS tows conducted off Hawaii. Start and end of each net deployment pressure, fluorometry,conductivity, salinity, temperature,potential temperature and potential density. These data represent 33 tows from three cruises.

DMO notes:
Changed MOCNESS time column to yrday_local and used it to get hour/min.
The first tow in each cruise has incorrect MOCNESS time when the tow crossed midnight. The hour and minute calculations are correct but for some reason the MOCNESS incremented a day when a net was opened. This is only true for the first tow in each cruise.
Added year column to help with time conversions.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Supplement 1. R code for estimating thresholds while accounting for variable detection and data for estimating thresholds for forest birds, Oregon, USA, 2007–2008.

Explore at:

htmlAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3552231.v1

Dataset updated

Jun 2, 2023

Dataset provided by

Wiley

Authors

Jay E. Jones; Andrew J. Kroll; Jack Giovanini; Steven D. Duke; Matthew G. Betts

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

United States, Oregon

Description

    Stand id
    Percent cover of conifer species
    Percent cover of broadleaf species
    Percent cover of deciduous broadleaf species
    Percent cover of hardwood species
    Percent cover of hardwood species in a 2000 m radius circle around each sample stand
    Elevation (m) of stand
    Age of stand
    Year of sampling
    Visit number
    Detection of Magnolia Warbler on Visit 1
    Detection of Magnolia Warbler on Visit 2
    Detection of Orange-crowned Warbler on Visit 1
    Detection of Orange-crowned Warbler on Visit 2
    Detection of Swainson’s Thrush on Visit 1
    Detection of Swainson’s Thrush on Visit 2
    Detection of Willow Flycatcher on Visit 1
    Detection of Willow Flycatcher on Visit 2
    Detection of Wilson’s Warbler on Visit 1
    Detection of Wilson’s Warbler on Visit 1

  Checksum values are:

    Column 2 (Percent cover of conifer species – CONIFER): SUM = 5862.83
    Column 3 (Percent cover of broadleaf species – BROAD): SUM = 7043.17
    Column 4 (Percent cover of deciduous broadleaf species – DECBROAD): SUM = 5475.17
    Column 5 (Percent cover of hardwood species – HARDWOOD): SUM = 2151.96
    Column 6 (Percent cover of hardwood species in a 2000 m radius circle around each sample stand– HWD2000): SUM = 3486.07
    Column 7 (Stand elevation – ELEVM): SUM = 83240.58
    Column 8 (Stand age – AGE): SUM = 1537; NA indicates a stand was harvested in 2008
    Column 9 (Year of sampling – YEAR): SUM = 425792
    Column 11 (MGWA.1): SUM = 70
    Column 12 (MGWA.2): SUM = 71
    Column 13 (OCWA.1): SUM = 121
    Column 14 (OCWA.2): SUM = 76
    Column 15 (SWTH.1): SUM = 90
    Column 16 (SWTH.2): SUM = 95
    Column 17 (WIFL.1): SUM = 85
    Column 18 (WIFL.2): SUM = 85
    Column 19 (WIWA.1): SUM = 36
    Column 20 (WIWA.2): SUM = 37

  The Supplement_R code.r file is R source code for simulation and empirical analyses conducted in Jones et al.

Clear search

Close search

Google apps

Main menu

Supplement 1. R code for estimating thresholds while accounting for variable...

Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2017

Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2015

Data and scripts for the analysis of the influence of crop pollinator...

An Index to Roscher's Lexicon of Mythology

1. Introduction

1.1. General Disclaimer

1.2. Licensing

2. Description of Data

3. Version History and Change Log

Tennessee Eastman Process Simulation Dataset

Intro

Content

Acknowledgements

User Agreement

Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2016

Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...

[PANDORA Total column ozone] - Total column ozone (O3) measured measured...

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

Inequality measures based on election data 1871 and 1892 for Swedish...

moricz_inequality_agriculture.csv

moricz_inequality_industry.csv

moricz_R1892_source_data.csv

moricz_R1871_source_data.csv

Water column process rate measurements from water collected aboard the R/V...

Total Aflatoxin Immunoaffinity Columns (IAC) Report

Data and tools for studying isograms

Summary deployment data for MOCNESS 1m2 and 10m2 tows from R/V Kilo Moana...

Supplement 1. R code for estimating thresholds while accounting for variable detection and data for estimating thresholds for forest birds, Oregon, USA, 2007–2008.