First, we would like to thank the wildland fire advisory group. Their wisdom and guidance helped us build the dataset as it currently exists. Currently, there are multiple, freely available fire datasets that identify wildfire and prescribed fire burned areas across the United States. However, these datasets are all limited in some way. Their time periods could cover only a couple of decades or they may have stopped collecting data many years ago. Their spatial footprints may be limited to a specific geographic area or agency. Their attribute data may be limited to nothing more than a polygon and a year. None of the existing datasets provides a comprehensive picture of fires that have burned throughout the last few centuries. Our dataset uses these existing layers and utilizes a series of both manual processes and ArcGIS Python (arcpy) scripts to merge these existing datasets into a single dataset that encompasses the known wildfires and prescribed fires within the United States and certain territories. Forty different fire layers were utilized in this dataset. First, these datasets were ranked by order of observed quality (Tiers). The datasets were given a common set of attribute fields and as many of these fields were populated as possible within each dataset. All fire layers were then merged together (the merged dataset) by their common attributes to created a merged dataset containing all fire polygons. Polygons were then processed in order of Tier (1-8) so that overlapping polygons in the same year and Tier were dissolved together. Overlapping polygons in subsequent Tiers were removed from the dataset. Attributes from the original datasets of all intersecting polygons in the same year across all Tiers were also merged so that all attributes from all Tiers were included, but only the polygons from the highest ranking Tier were dissolved to form the fire polygon. The resulting product (the combined dataset) has only one fire per year in a given area with one set of attributes. While it combines wildfire data from 40 wildfire layers and therefore has more complete information on wildfires than the datasets that went into it, this dataset has also has its own set of limitations. Please see the Data Quality attributes within the metadata record for additional information on this dataset's limitations. Overall, we believe this dataset is designed be to a comprehensive collection of fire boundaries within the United States and provides a more thorough and complete picture of fires across the United States when compared to the datasets that went into it.
First, we would like to thank the wildland fire advisory group. Their wisdom and guidance helped us build the dataset as it currently exists. This dataset is comprised of two different zip files. Zip File 1: The data within this zip file are composed of two wildland fire datasets. (1) A merged dataset consisting of 40 different wildfire and prescribed fire layers. The original 40 layers were all freely obtained from the internet or provided to the authors free of charge with permission to use them. The merged layers were altered to contain a consistent set of attributes including names, IDs, and dates. This raw merged dataset contains all original polygons many of which are duplicates of the same fire. This dataset also contains all the errors, inconsistencies, and other issues that caused some of the data to be excluded from the combined dataset. Care should be used when working with this dataset as individual records may contain errors that can be more easily identified in the combined dataset. (2) A combined wildland fire polygon dataset composed of both wildfires and prescribed fires ranging in years from mid 1800s to the present that was created by merging and dissolving fire information from 40 different original wildfire datasets to create one of the most comprehensive wildfire datasets available. Attributes describing fires that were reported in the various sources are also merged, including fire names, fire codes, fire IDs, fire dates, fire causes. Zip File 2: The fire polygons were turned into 30 meter rasters representing various summary counts: (a) count of all wildland fires that burned a pixel, (b) count of wildfires that burned a pixel, (c) the first year a wildfire burned a pixel, (d) the most recent year a wildfire burned a pixel, and (e) count of prescribed fires that burned a pixel.
Labour Relations in the United States: 1800
An abridged data format, created by Daan Jansen (IISH) and continuing on earlier work by Joris Kok (IISH), is being offered as an alternative in October 2020. This new version of the dataset includes only records that contain labour relations, leaving out all population data. This update also involved (depending on the dataset in question, substantial) data cleaning, separating male and female individuals, and removing any duplicate records. Hence, the aggregated number of people mentioned in these updated datasets should equal the total population.
The Department of State keeps a record of every filing for every incorporated business in the state of New York. This dataset contains information on all active corporations as of the last business day of the specified month and year.
https://www.icpsr.umich.edu/web/ICPSR/studies/38308/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38308/terms
This dataset presents information on historical central government revenues for 31 countries in Europe and the Americas for the period from 1800 (or independence) to 2012. The countries included are: Argentina, Australia, Austria, Belgium, Bolivia, Brazil, Canada, Chile, Colombia, Denmark, Ecuador, Finland, France, Germany (West Germany between 1949 and 1990), Ireland, Italy, Japan, Mexico, New Zealand, Norway, Paraguay, Peru, Portugal, Spain, Sweden, Switzerland, the Netherlands, the United Kingdom, the United States, Uruguay, and Venezuela. In other words, the dataset includes all South American, North American, and Western European countries with a population of more than one million, plus Australia, New Zealand, Japan, and Mexico. The dataset contains information on the public finances of central governments. To make such information comparable cross-nationally the researchers chose to normalize nominal revenue figures in two ways: (i) as a share of the total budget, and (ii) as a share of total gross domestic product. The total tax revenue of the central state is disaggregated guided by the Government Finance Statistics Manual 2001 of the International Monetary Fund (IMF) which provides a classification of types of revenue, and describes in detail the contents of each classification category. Given the paucity of detailed historical data and the needs of our project, researchers combined some subcategories. First, they were interested in total tax revenue, as well as the shares of total revenue coming from direct and indirect taxes. Further, they measured two sub-categories of direct taxation, namely taxes on property and income. For indirect taxes, they separated excises, consumption, and customs.
The Wethersfield State Prison opened in September 1827 with the transfer of eighty-one prisoners from Newgate Prison. Modeled after the state-of-the-art Auburn State Prison in New York, Wethersfield provided solitary confinement for the prisoners and facilities for various workshops. In 1963, all prisoners from Wethersfield were transferred to a new State Prison at Somers and two years later Wethersfield State Prison was demolished. The Warrants of Commitment, 1800-1903, contain the name of the prisoner, any known aliases, the crime, sentence, court, and date of incarceration. This information can be used to determine the probable _location of court records relating to the individual prisoners. People may request a copy of a file by contacting the staff of the History & Genealogy Unit by telephone (860) 757-6580 or email. When requesting a copy of a record, please include at least the name of the individual and date.
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Abstract copyright UK Data Service and data collection copyright owner.
The European State Finance Database (ESFD) is an international collaborative research project for the collection of data in European fiscal history. There are no strict geographical or chronological boundaries to the collection, although data for this collection comprise the period between c.1200 to c.1815. The purpose of the ESFD was to establish a significant database of European financial and fiscal records. The data are drawn from the main extant sources of a number of European countries, as the evidence and the state of scholarship permit. The aim was to collect the data made available by scholars, whether drawing upon their published or unpublished archival research, or from other published material.This metadata record describes two metrics that quantitatively measure the impact of reservoir storage on every flowline in the NHDPlus version 2 data suite (NHDPlusV2) for the conterminous United States. These metrics are computed for every 10 years from 1800 - 2015. The first metric (DamIndex_EROM.zip) estimates reservoir storage intensity in units of days based on reservoir storage in a contributing area normalized by the mean annual streamflow. This metric indicates the duration of storage impact upstream from each stream segment relative to the typical flow condition. In addition, this metric provides an assessment of the potential influence of a dam on average and low flows because the metric estimates the number of days of flow that can be sustained by contributing area storage alone, without additional water or groundwater input. The second metric (DamIndex_PMC.zip) represents the degree of regulation of a river reach based on upstream reservoir storage relative to the 30-year average annual precipitation, as well as the upstream dam and watershed areas. This second metric provides an estimate of the capacity of the contributing area to store precipitation and is oriented to understanding how peak flows may be affected by dams throughout the flow network; this metric is dimensionless. Reservoir storage, construction date and location data were obtained from the US Army Corps of Engineers' National Inventory of Dams (NID, 2018). Also, the dataset in this data release includes dam locations addressed to NHDPlusv2 (Final_NID_2018.zip). These calculations are based on the maximum NID storage , which indicates the maximum amount of water that can be stored behind each dam and therefore may overestimate the true reservoir storage impacts.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset consists of time-resolved reconstructions of ocean interior acidification from 1800 through 1994, 2004, and 2014. The basis of these reconstructions are observation-based estimates of the accumulation of anthropogenic carbon, combined with climatologies of hydrographic and biogeochemical properties in the ocean interior. Acidification trends are determined for several parameters of the marine CO2 system, namely the saturation state of aragonite (Ωarag), the carbonate ion concentration ([CO32-]), the free proton concentration ([H+]), and pH on the total scale (pHT). The underlying anthropogenic carbon concentration (ΔCant), the computed sensitivities of the four marine CO2 system parameters and their absolute state estimates are provided as well. The datasets contain in addition to the standard estimate also 14 sensitivity cases, which are intended to assess the robustness of our acidification estimates to changes in the estimation procedure of ΔCant as well as the climatological distributions of other hydrographic properties. All estimates are provided on a horizontal grid with 1° x 1° resolution and for 28 depth layers from 0 - 3000m. These data provide strong constraints on ocean interior acidification over the industrial era, unravelling in particular its progression since 1994.
This data set contains monthly temperature, precipitation, sea-level pressure, and station-pressure data for thousands of meteorological stations worldwide. The database was compiled from pre-existing national, regional, and global collections of data as part of the Global Historical Climatology Network (GHCN) project, the goal of which is to produce, maintain, and make available a comprehensive global surface baseline climate data set for monitoring climate and detecting climate change. It contains data from roughly 6000 temperature stations, 7500 precipitation stations, 1800 sea level pressure stations, and 1800 station pressure stations. Each station has at least 10 years of data, 40% have more than 50 years of data. Spatial coverage is good over most of the globe, particularly for the United States and Europe. Data gaps are evident over the Amazon rainforest, the Sahara Desert, Greenland, and Antarctica.
MLRegTest is a benchmark for machine learning systems on sequence classification, which contains training, development, and test sets from 1,800 regular languages. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies., The languages were generated by creating finite-state acceptors and the datasets were generated by sampling from these finite-state acceptors. The scripts and software used for these processes are open source and available. For details, see https://github.com/heinz-jeffrey/subregular-learning. Details are described in the arxiv preprint "MLRegTest: A Benchmark for the Machine Learning of Regular Languages"., , # MLRegTest: A benchmark for the machine learning of regular languages
https://doi.org/10.5061/dryad.dncjsxm4h
MLRegTest provides training and testing data for 1800 regular languages.
This repository contains three gzipped tar archives.
> data.tar.gz (21GB) > languages.tar.gz (4.5MB) > models.tar.gz (76GB)
When uncompressed, these yield three directories, described in detail below.
> data (43GB) > languages (38MB) > models (87GB)
Languages are named according to the scheme Sigma.Tau.class.k.t.i.plebby
, where Sigma
is a two-digit alphabet size, Tau
a two-digit number of salient symbols (the 'tier'), class
the named subregular class, k
the width of factors used (if applicable), t
the threshold counted to (if applicable), and i
a unique identifier. The table below unabbreviates the class names, and shows how many languages of each class there are.
| class | name ...
https://www.icpsr.umich.edu/web/ICPSR/studies/37155/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37155/terms
This collection contains five modified data sets with mortality, population, and other demographic information for five American cities (Baltimore, Maryland; Boston, Massachusetts; New Orleans, Louisiana; New York City (Manhattan only), New York; and Philadelphia, Pennsylvania) from the early 19th century to the early 20th century. Mortality was represented by an annual crude death rate (deaths per 1000 population per year). The population was linearly interpolated from U.S. Census data and state census data (for Boston and New York City). All data sets include variables for year, total deaths, census populations, estimated annual linearly interpolated populations, and crude death rate. The Baltimore data set (DS0001) also provides birth and death rate variables based on race and slave status demographics, as well as a variable for stillbirths. The Philadelphia data set (DS0005) also includes variables for total births, total infant deaths, crude birth rate, and infant deaths per 1,000 live births.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
History of all business entity transactions with the Colorado Department of State (CDOS). The dataset goes back to the 1800’s and contains millions of records. This data set is provided by the Colorado Department of State (CDOS) business division.
An Alerting Authority is a jurisdiction with the designated authority to alert and warn the public when there is an impending natural or human-made disaster, threat, or dangerous or missing person. Today, there are more than 1,800 federal, state, local, tribal and territorial Alerting Authorities using IPAWS to issue critical public alerts and warnings in their jurisdictions.
The Foreign Service Act of 1980 mandated a comprehensive revision to the operation of the Department of State and the personnel assigned to the US Foreign Service. As the statutory authority, the Foreign Affairs Manual (FAM), details the Department of State's regulations and policies on its structure and operations. Currently, there are over 25,000 pages of policies and procedures published in 16 volumes of the FAM and 38 corresponding sections of the Foreign Affairs Handbook (FAH). The FAM and FAH are revised accordingly as changes in the organization occur. 3 FAM 1800 contains documentation of the following administrative components: - 1810 Family Advocacy Program (Child Abuse, Child Neglect and Domestic Violence)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
First, we would like to thank the wildland fire advisory group. Their wisdom and guidance helped us build the dataset as it currently exists. Currently, there are multiple, freely available fire datasets that identify wildfire and prescribed fire burned areas across the United States. However, these datasets are all limited in some way. Their time periods could cover only a couple of decades or they may have stopped collecting data many years ago. Their spatial footprints may be limited to a specific geographic area or agency. Their attribute data may be limited to nothing more than a polygon and a year. None of the existing datasets provides a comprehensive picture of fires that have burned throughout the last few centuries. Our dataset uses these existing layers and utilizes a series of both manual processes and ArcGIS Python (arcpy) scripts to merge these existing datasets into a single dataset that encompasses the known wildfires and prescribed fires within the United States and certain territories. Forty different fire layers were utilized in this dataset. First, these datasets were ranked by order of observed quality (Tiers). The datasets were given a common set of attribute fields and as many of these fields were populated as possible within each dataset. All fire layers were then merged together (the merged dataset) by their common attributes to created a merged dataset containing all fire polygons. Polygons were then processed in order of Tier (1-8) so that overlapping polygons in the same year and Tier were dissolved together. Overlapping polygons in subsequent Tiers were removed from the dataset. Attributes from the original datasets of all intersecting polygons in the same year across all Tiers were also merged so that all attributes from all Tiers were included, but only the polygons from the highest ranking Tier were dissolved to form the fire polygon. The resulting product (the combined dataset) has only one fire per year in a given area with one set of attributes. While it combines wildfire data from 40 wildfire layers and therefore has more complete information on wildfires than the datasets that went into it, this dataset has also has its own set of limitations. Please see the Data Quality attributes within the metadata record for additional information on this dataset's limitations. Overall, we believe this dataset is designed be to a comprehensive collection of fire boundaries within the United States and provides a more thorough and complete picture of fires across the United States when compared to the datasets that went into it.