100+ datasets found

Data from: On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical...
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
catalog.data.gov
Updated Dec 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical Distribution [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/on-tracer-breakthrough-curve-dataset-size-shape-and-statistical-distribution
Explore at:
Dataset updated
Dec 14, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
A tracer breakthrough curve (BTC) for each sampling station is the ultimate goal of every quantitative hydrologic tracing study, and dataset size can critically affect the BTC. Groundwater-tracing data obtained using in situ automatic sampling or detection devices may result in very high-density data sets. Data-dense tracer BTCs obtained using in situ devices and stored in dataloggers can result in visually cluttered overlapping data points. The relatively large amounts of data detected by high-frequency settings available on in situ devices and stored in dataloggers ensure that important tracer BTC features, such as data peaks, are not missed. Alternatively, such dense datasets can also be difficult to interpret. Even more difficult, is the application of such dense data sets in solute-transport models that may not be able to adequately reproduce tracer BTC shapes due to the overwhelming mass of data. One solution to the difficulties associated with analyzing, interpreting, and modeling dense data sets is the selective removal of blocks of the data from the total dataset. Although it is possible to arrange to skip blocks of tracer BTC data in a periodic sense (data decimation) so as to lessen the size and density of the dataset, skipping or deleting blocks of data also may result in missing the important features that the high-frequency detection setting efforts were intended to detect. Rather than removing, reducing, or reformulating data overlap, signal filtering and smoothing may be utilized but smoothing errors (e.g., averaging errors, outliers, and potential time shifts) need to be considered. Appropriate probability distributions to tracer BTCs may be used to describe typical tracer BTC shapes, which usually include long tails. Recognizing appropriate probability distributions applicable to tracer BTCs can help in understanding some aspects of the tracer migration. This dataset is associated with the following publications: Field, M. Tracer-Test Results for the Central Chemical Superfund Site, Hagerstown, Md. May 2014 -- December 2015. U.S. Environmental Protection Agency, Washington, DC, USA, 2017. Field, M. On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical Distribution. ADVANCES IN WATER RESOURCES. Elsevier Science Ltd, New York, NY, USA, 141: 1-19, (2020).
N
Median Household Income Variation by Family Size in Lower Frederick...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Lower Frederick Township, Pennsylvania: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b22521b-73fd-11ee-949f-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Lower Frederick Township, Pennsylvania
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Lower Frederick Township, Pennsylvania, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Lower Frederick township did not include 6, or 7-person households. Across the different household sizes in Lower Frederick township the mean income is $114,174, and the standard deviation is $36,893. The coefficient of variation (CV) is 32.31%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $58,807. It then further increased to $135,893 for 5-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/lower-frederick-township-pa-median-household-income-by-household-size.jpeg" alt="Lower Frederick Township, Pennsylvania median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Lower Frederick township median household income. You can refer the same here
b
Does repeatedly viewing overweight versus underweight images change...
data.bris.ac.uk
Updated Mar 22, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Does repeatedly viewing overweight versus underweight images change satisfaction with own body size? - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/my46h06p89rw163wu2du2jjzn
Explore at:
Dataset updated
Mar 22, 2017
Description
The study investigated whether one week of exposure to images of bodies of different weights affects the way healthy adult women perceive their own body size.
f
Dataset for: Temperature-dependent body size effects determine population...
wiley.figshare.com
txt
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Lindmark; Magnus Huss; Jan Ohlberger; Anna Gårdmark (2023). Dataset for: Temperature-dependent body size effects determine population responses to climate warming [Dataset]. http://doi.org/10.6084/m9.figshare.5513374
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5513374
Dataset updated
Jun 2, 2023
Dataset provided by
Wiley
Authors
Max Lindmark; Magnus Huss; Jan Ohlberger; Anna Gårdmark
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Current understanding of animal population responses to rising temperatures is based on the assumption that biological rates such as metabolism, which governs fundamental ecological processes, scale independently with body size and temperature, despite empirical evidence for interactive effects. Here we investigate the consequences of interactive temperature- and size-scaling of vital rates for the dynamics of populations experiencing warming using a stage-structured consumer-resource model. We show that interactive scaling alters population and stage-specific responses to rising temperatures, such that warming can induce shifts in population regulation and stage-structure, influence community structure and govern population responses to mortality. Analyzing experimental data for 20 fish species, we found size-temperature interactions in intraspecific scaling of metabolic rate to be common. Given the evidence for size-temperature interactions and the ubiquity of size structure in animal populations, we argue that accounting for size-specific temperature effects is pivotal for understanding how warming affects animal populations and communities.
Olympus VSI "multifile" test dataset
zenodo.org
zip
Updated Feb 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikolaus Ehrenfeuchter; Nikolaus Ehrenfeuchter (2022). Olympus VSI "multifile" test dataset [Dataset]. http://doi.org/10.5281/zenodo.5848988
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5848988
Dataset updated
Feb 15, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nikolaus Ehrenfeuchter; Nikolaus Ehrenfeuchter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Olympus VSI "multifile" test dataset

A publicly available dataset in the proprietary Olympus VSI format intended for testing file readers and other related mechanisms.

Key properties / requirements:

overall size is as small as possible (~200 kiB in this case)

dataset consists of a top-level .vsi file plus a subfolder structure containing one or more .ets files

IMPORTANT: this dataset was heavily postprocessed (see the section below), it is purely meant to have a valid example of the file format structure.

Dataset Information

Image Dimensions

channels: 2 (405, 488)

size X: 1645

size Y: 1682

size Z: 11

Software Version

The software used to acquire and postprocess the dataset.

cellSens Dimension v3.1 (Build 21199)

Postprocessing

The following steps were applied to reduce the size of the dataset:

cropping

background subtraction

rank filter

NxN filter

JPEG 2000 compression
AI Training Dataset Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). AI Training Dataset Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, and UK), APAC (China, India, Japan, and South Korea), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/ai-training-dataset-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jul 15, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
Canada, United States, United Kingdom
Description
Snapshot img

AI Training Dataset Market Size 2025-2029

The ai training dataset market size is valued to increase by USD 7.33 billion, at a CAGR of 29% from 2024 to 2029. Proliferation and increasing complexity of foundational AI models will drive the ai training dataset market.

Market Insights

North America dominated the market and accounted for a 36% growth during the 2025-2029. By Service Type - Text segment was valued at USD 742.60 billion in 2023 By Deployment - On-premises segment accounted for the largest market revenue share in 2023

Market Size & Forecast

Market Opportunities: USD 479.81 million Market Future Opportunities 2024: USD 7334.90 million CAGR from 2024 to 2029 : 29%

Market Summary

The market is experiencing significant growth as businesses increasingly rely on artificial intelligence (AI) to optimize operations, enhance customer experiences, and drive innovation. The proliferation and increasing complexity of foundational AI models necessitate large, high-quality datasets for effective training and improvement. This shift from data quantity to data quality and curation is a key trend in the market. Navigating data privacy, security, and copyright complexities, however, poses a significant challenge. Businesses must ensure that their datasets are ethically sourced, anonymized, and securely stored to mitigate risks and maintain compliance. For instance, in the supply chain optimization sector, companies use AI models to predict demand, optimize inventory levels, and improve logistics. Access to accurate and up-to-date training datasets is essential for these applications to function efficiently and effectively. Despite these challenges, the benefits of AI and the need for high-quality training datasets continue to drive market growth. The potential applications of AI are vast and varied, from healthcare and finance to manufacturing and transportation. As businesses continue to explore the possibilities of AI, the demand for curated, reliable, and secure training datasets will only increase.

What will be the size of the AI Training Dataset Market during the forecast period?

Get Key Insights on Market Forecast (PDF) Request Free SampleThe market continues to evolve, with businesses increasingly recognizing the importance of high-quality datasets for developing and refining artificial intelligence models. According to recent studies, the use of AI in various industries is projected to grow by over 40% in the next five years, creating a significant demand for training datasets. This trend is particularly relevant for boardrooms, as companies grapple with compliance requirements, budgeting decisions, and product strategy. Moreover, the importance of data labeling, feature selection, and imbalanced data handling in model performance cannot be overstated. For instance, a mislabeled dataset can lead to biased and inaccurate models, potentially resulting in costly errors. Similarly, effective feature selection algorithms can significantly improve model accuracy and reduce computational resources. Despite these challenges, advances in model compression methods, dataset scalability, and data lineage tracking are helping to address some of the most pressing issues in the market. For example, model compression techniques can reduce the size of models, making them more efficient and easier to deploy. Similarly, data lineage tracking can help ensure data consistency and improve model interpretability. In conclusion, the market is a critical component of the broader AI ecosystem, with significant implications for businesses across industries. By focusing on data quality, effective labeling, and advanced techniques for handling imbalanced data and improving model performance, organizations can stay ahead of the curve and unlock the full potential of AI.

Unpacking the AI Training Dataset Market Landscape

In the realm of artificial intelligence (AI), the significance of high-quality training datasets is indisputable. Businesses harnessing AI technologies invest substantially in acquiring and managing these datasets to ensure model robustness and accuracy. According to recent studies, up to 80% of machine learning projects fail due to insufficient or poor-quality data. Conversely, organizations that effectively manage their training data experience an average ROI improvement of 15% through cost reduction and enhanced model performance.

Distributed computing systems and high-performance computing facilitate the processing of vast datasets, enabling businesses to train models at scale. Data security protocols and privacy preservation techniques are crucial to protect sensitive information within these datasets. Reinforcement learning models and supervised learning models each have their unique applications, with the former demonstrating a 30% faster convergence rate in certain use cases.

Data annot
d
Hydrogeologic Aquifer Test dataset, Lower Mississipp-Gulf Water Science...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Hydrogeologic Aquifer Test dataset, Lower Mississipp-Gulf Water Science Center, December 2020 [Dataset]. https://catalog.data.gov/dataset/hydrogeologic-aquifer-test-dataset-lower-mississipp-gulf-water-science-center-december-202
Explore at:
Dataset updated
Oct 1, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Since the 1940's, hydrologists have used aquifer tests to estimate the hydrogeologic properties near test wells. Results from these tests are recorded in various files, databases, reports and scientific publications. The U.S. Geological Survey (USGS), Lower Mississippi-Gulf Water Science Center (LMG) is aggregating all aquifer test results from Alabama, Arkansas, Louisiana, Mississippi and Tennessee into a single dataset that is publicly available in a machine-readable format. This dataset contains information and results from 2,245 aquifer tests compiled for aquifers located in the LMG-Hydrogeologic Aquifer Test Dataset - December 2020. Descriptive statistics for the December 2020 dataset are presented in Table 1 (below) and in the Summary_Readme.pdf. Additionally, this dataset contains 6 attribute tables (.txt files) with additional information for various fields, a zip file containing the geospatial data, and the companion attribute table as a .txt file. THE LMG-HYDROGEOLOGIC AQUIFER TEST DATASET – DECEMBER 2020 IS AVAILABLE IN TWO FORMATS: 1) a tab delimited text (.txt) UTF-8 file and 2) an ESRI GIS point shapefile. FIELDS INCLUDED IN THE LMG-HYDROGEOLOGIC AQUIFER TEST DATASET – DECEMBER 2020: [a complete list of field names, their definitions and units are listed in the Summary_Readme.pdf file] Location data: USGS site identification number, local identification name, Public Land Survey System number, latitude, longitude, State and county. Well construction data: Construction date, well depth, Diameter of well, diameter of casing, depth to top of opening (screen) interval, depth to bottom of opening interval and length of the open interval. Aquifer data: Local aquifer name and code, national aquifer name and code, top of aquifer (altitude), bottom of aquifer, and thickness of aquifer. Groundwater test data: Test date, yield/discharge, length of time associated with yield, static water-level in feet below land surface, production water-level in feet below land surface associated with yield, drawdown associated with yield. Hydrogeologic data: Specific capacity, transmissivity, horizontal Conductivity, vertical conductivity, permeability and storage coefficient. Ancillary data: Method of test analysis and data source reference. DESCRIPTIONS OF ATTACHED FILES: Summary_Readme.pdf: a Portable Document Format (PDF) file with field names, definitions and units for the aquifer test dataset and the associated attribute tables. This file also contains summary statistics for aquifer test compiled through December 2020. LMG-HydrogeologicAqfrTestDataset_Dec2020.txt: a tab-delimited, UTF-8 text file of the attribute table associated with the LMG-HydrogeologicTestData_Dec2020 geospatial dataset. AtbtTbl_AqfrCd_Readme.txt: an UTF-8 text file containing information from the National Water Information System: Help System web page about USGS groundwater codes. (accessed December 4, 2019 at https://help.waterdata.usgs.gov/codes-and-parameters) AtbtTbl_FipsGeographyCodes.txt: a tab-delimited, UTF-8 text file of FIPS (Federal Information Processing Standards) codes, uniquely identifying States, counties and county equivalents in the United States. Note: to reduce the size of this file, city codes were removed. (accessed January 8, 2020 at https://www.census.gov/geographies/reference-files/2017/demo/popest/2017-fips.html). AtbtTbl_LocalAqfrCodes.txt: a tab-delimited, UTF-8 text file of eight-character string identifying an aquifer. Codes are defined by the "Catalog of Aquifer Names and Geologic Unit Codes used by the USGS. (accessed December 4, 2019 at https://help.waterdata.usgs.gov/aqfr_cd) AtbtTbl_NatAqfrCodes.txt: a tab-delimited, UTF-8 text file of ten-character strings identifying a National aquifer, or principal aquifer of the United States, that are defined as regionally extensive aquifers or aquifer systems that have the potential to be used as a source of potable water. (accessed December 4, 2019 at https://water.usgs.gov/ogw/NatlAqCode-reflist.html) AtbtTbl_TstMthdCodes.txt: a tab-delimited, UTF-8 text file of codes identifying the aquifer test analysis method when reported in the associated reference. AtbtTbl_DataRefNo.txt: a tab-delimited, UTF-8 text file of references for the source of the associated aquifer test result. CAVEAT: Some hydrogeologic test results reported in this dataset have not been through the USGS data review and approval process to receive the Director’s approval. Any such data are considered PROVISIONAL and subject to revision. PROVISIONAL data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. NOTE: -- If you have data you would like added to this dataset or have found an error, please contact the USGS so we may incorporate them into the next version of the LMG- Hydrogeologic Aquifer Test dataset. Table 1. Summary-descriptive statistics for the LMG-HYDROGEOLOGIC AQUIFER TEST DATASET – December 2020. [USGS, U.S. Geological Survey; NWIS, National Water Information System; n, number of wells] USGS-NWIS NATIONAL STANDARD AQUIFER NAME AND CODE n MAXIMUM MINIMUM MEAN MEDIUM DEVIATION Specific capacity (gallons per minute per foot) All well data 1733 15000 0.0025 84 8.7 552 Alluvial aquifers (N100ALLUVL) 21 723 0.98 57 12 161 Mississippi River Valley alluvial aquifer (N100MSRVVL) 185 10000 0.06 265 72 864 Other aquifers (N9999OTHER) 3 50 1.20 18 2.1 28 Coastal lowlands aquifer system (S100CSLLWD) 913 15000 0.05 93 12 645 Mississippi embayment aquifer system (S100MSEMBM) 429 641 0.01 13 4 44 Southeastern Coastal Plain aquifer system (S100SECSLP) 99 71 0.10 6.2 3.7 8.7 Ozark Plateaus aquifer system (S400OZRKPL) 30 16 0.16 3.6 1.7 4.2 Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 53 972 0.0025 59 10 151 Transmissivity (square feet per day) All well data 1549 260678 1.3 12366 5080 20711 Alluvial aquifers (N100ALLUVL) 26 41700 450 9294 8422 8420 Mississippi River Valley alluvial aquifer (N100MSRVVL) 146 171800 236 31934 24431 28074 Other aquifers (N9999OTHER) 4 26000 24 8506 4000 11822 Coastal lowlands aquifer system (S100CSLLWD) 703 260678 1.5 15585 8000 23875 Mississippi embayment aquifer system (S100MSEMBM) 456 36000 1.3 4618 2406 6006 Southeastern Coastal Plain aquifer system (S100SECSLP) 114 80000 5.00 3652 1340 8838 Ozark Plateaus aquifer system (S400OZRKPL) 36 4983 42 1056 534 1262 Edwards-Trinity aquifer system (S500EDRTRN) 1 161 161 161 161 -- Unknown National aquifer 63 84486 5.9 11103 4345 16908 Horizontal hydraulic conductivity (feet per day) All well data 749 1077 0.01 72 50 82 Alluvial aquifers (N100ALLUVL) 6 321 39.88 160 176 106 Mississippi River Valley alluvial aquifer (N100MSRVVL) 46 400 6.88 182 190 134 Other aquifers (N9999OTHER) 4 269 92.00 183 185 95 Coastal lowlands aquifer system (S100CSLLWD) 268 1077 1.00 93 81 85 Mississippi embayment aquifer system (S100MSEMBM) 271 370 0.02 54 43 52 Southeastern Coastal Plain aquifer system (S100SECSLP) 109 230 0.30 31 14 36 Ozark Plateaus aquifer system (S400OZRKPL) 33 1.9 0.01 0.54 0.31 0.58 Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 12 267 16.00 104 54 99 Permeability (gallons per day per square feet) All well data 497 8375 0.12 736 400 947 Alluvial aquifers (N100ALLUVL) 12 2400 328 1307 1270 602 Mississippi River Valley alluvial aquifer (N100MSRVVL) 43 7891 110 1926 1785 1174 Other aquifers (N9999OTHER) 0 -- -- -- -- -- Coastal lowlands aquifer system (S100CSLLWD) 263 8375 11 796 636 973 Mississippi embayment aquifer system (S100MSEMBM) 165 1300 0.12 235 177 237 Southeastern Coastal Plain aquifer system (S100SECSLP) 0 -- -- -- -- -- Ozark Plateaus aquifer system (S400OZRKPL) 0 -- -- -- -- -- Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 14 4158 201 1390 1204 963 Storage coefficient (dimensionless) All well data 490 1.62 6.30E-10 0.0083 0.00051 0.081 Alluvial aquifers (N100ALLUVL) 21 0.08 0.0002 0.0053 0.00054 0.017 Mississippi River Valley alluvial aquifer (N100MSRVVL) 82 0.09 0.0001 0.0081 0.0013 0.016 Other aquifers (N9999OTHER) 1 0.0006 0.0006 0.0006 0.0006 -- Coastal lowlands aquifer system (S100CSLLWD) 233 0.72 6.30E-10 0.0054 0.0005 0.048 Mississippi embayment aquifer system (S100MSEMBM) 100 1.62 0.000012 0.0180 0.00027 0.16 Southeastern Coastal Plain aquifer system (S100SECSLP) 16 0.006 0.00003 0.0005 0.0002 0.0015 Ozark Plateaus aquifer system (S400OZRKPL) 0 -- -- -- -- -- Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 37 0.05 0.000078 0.0062 0.00067 0.014 This dataset was developed as part of the U.S. Geological Survey, Mississippi Alluvial Plain Regional Water-Availability Study.
D
database for Policy Decision making for Future climate change (dynamical...
search.diasjp.net
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
osamu arakawa, database for Policy Decision making for Future climate change (dynamical downscaling over Japan) [Dataset]. https://search.diasjp.net/en/dataset/d4PDF_RCM
Explore at:
Dataset provided by
Program for Risk Information on Climate Change
Authors
osamu arakawa
Area covered
Japan
Description
(1) This is the dataset simulated by high resolution atmospheric model of which horizontal resolution is 60km-mesh over the globe (GCM), and 20km over Japan and surroundings (RCM), respetively. The climate of the latter half of the 20th century is simulated for 6000 years (3000 years for the Japan area), and the climates 1.5 K(*2), 2 K (*1) and 4 K warmer than the pre-industrial climate are simulated for 1566, 3240 and 5400 years, respectivley, to see the effect of global warming. (2) Huge number of ensembles enable not only with statistics but also with high accuracy to estimate the future change of extreme events such as typoons and localized torrential downpours. In addtion, this dataset provides the highly reliable information on the impact of natural disasters due to climate change on future societies. (3) This dataset provides the climate projections which adaptations against global warming are based on in various fields, for example, disaster prevention, urban planning, environmetal protection, and so on. It would realize the global warming adaptations consistent not only among issues but also among regions. (4) Total size of this dataset is 3 PB (3 × the 15th power of 10 bytes).

(*1) Datasets of the climates 2K warmer than the pre-industorial climate (d4PDF 2K) is available on 10th August, 2018. (*2) Datasets of the climates 1.5K warmer than the pre-industorial climate (d4PDF 1.5K) is available on 8th February, 2022.
d
Data from: Size does matter: overcoming the adeno-associated virus packaging...
catalog.data.gov
odgavaprod.ogopendata.com
Updated Sep 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Size does matter: overcoming the adeno-associated virus packaging limit [Dataset]. https://catalog.data.gov/dataset/size-does-matter-overcoming-the-adeno-associated-virus-packaging-limit
Explore at:
Dataset updated
Sep 7, 2025
Dataset provided by
National Institutes of Health
Description
Recombinant adeno-associated virus (rAAV) vectors mediate long-term gene transfer without any known toxicity. The primary limitation of rAAV has been the small size of the virion (20 nm), which only permits the packaging of 4.7 kilobases (kb) of exogenous DNA, including the promoter, the polyadenylation signal and any other enhancer elements that might be desired. Two recent reports (D Duan et al: Nat Med 2000, 6:595-598; Z Yan et al: Proc Natl Acad Sci USA 2000, 97:6716-6721) have exploited a unique feature of rAAV genomes, their ability to link together in doublets or strings, to bypass this size limitation. This technology could improve the chances for successful gene therapy of diseases like cystic fibrosis or Duchenne muscular dystrophy that lead to significant pulmonary morbidity.
f
Data from: S8 Fig -
plos.figshare.com
zip
Updated Aug 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc (2023). S8 Fig - [Dataset]. http://doi.org/10.1371/journal.pone.0289211.s009
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289211.s009
Dataset updated
Aug 3, 2023
Dataset provided by
PLOS ONE
Authors
Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images—a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.
N
Median Household Income Variation by Family Size in Lower Heidelberg...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Lower Heidelberg Township, Pennsylvania: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b2255e9-73fd-11ee-949f-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Lower Heidelberg Township, Pennsylvania
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Lower Heidelberg Township, Pennsylvania, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Lower Heidelberg township did not include 6, or 7-person households. Across the different household sizes in Lower Heidelberg township the mean income is $132,399, and the standard deviation is $46,380. The coefficient of variation (CV) is 35.03%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $62,957. It then further increased to $155,174 for 5-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/lower-heidelberg-township-pa-median-household-income-by-household-size.jpeg" alt="Lower Heidelberg Township, Pennsylvania median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Lower Heidelberg township median household income. You can refer the same here
d
Data from: Insect size responses to climate change vary across elevations...
datadryad.org
search.dataone.org
zip
Updated Jan 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
César Nufio; Monica Sheffer; Julia Smith; Michael Troutman; Simran Bawa; Ebony Taylor; Sean Schoville; Caroline Williams; Lauren Buckley (2025). Insect size responses to climate change vary across elevations according to seasonal timing [Dataset]. http://doi.org/10.5061/dryad.wwpzgmst6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wwpzgmst6
Dataset updated
Jan 10, 2025
Dataset provided by
Dryad
Authors
César Nufio; Monica Sheffer; Julia Smith; Michael Troutman; Simran Bawa; Ebony Taylor; Sean Schoville; Caroline Williams; Lauren Buckley
Time period covered
Jul 11, 2024
Description
Data from: Insect size responses to climate changes vary across elevations according to seasonal timing

https://doi.org/10.5061/dryad.wwpzgmst6

The repository includes the following files:

AlexanderBodySize_all.csv: grasshopper body size dataset for the Gordon Alexander collection at the University of Colorado Museum of Natural History.

AlexanderBodySize_wClimate.csv: abbreviated grasshopper body size dataset with appended climate data.

HopperData_Sept2019.csv: data from grasshopper phenological surveys from Buckley et al (2021).

Levy_FemaleGradientDataGrasshopper.csv: reproductive data from Levy and Nufio (2014).

NiwotClimateFilled.csv: climate data for study sites.

Description of the data and file structure

Data are described in the AlexanderBodysizeData_Readme.csv file and below.

AlexanderBodySize_all.csv

| attributeName | attributeLabel | attributeDefinition | storageType | formatString | unit | mi...
Datasets for Evaluation of Multimodal Image Registration
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. http://doi.org/10.5281/zenodo.4587903
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4587903
Dataset updated
Oct 10, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

Aerial data

The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.

Modality A: Near-Infrared (NIR)

Modality B: three colour channels (in B-G-R order)

Cytological data

The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.

Modality A: Fluorescence Images

Modality B: Quantitative Phase Images (QPI)

Histological dataset

For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.

Modality A: Second Harmonic Generation (SHG)

Modality B: Bright-Field (BF)

The evaluation set created from the above three publicly available datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, and 536 image pairs created from the histological dataset. Each image pair consists of a reference patch $I^{\text{Ref}}$ and its corresponding initial transformed patch $I^{\text{Init}}$ in both modalities, along with the ground-truth transformation parameters to recover it.

Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

Metadata

In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

Filename: identifier(ID) of the image pair

X1_Ref: x-coordinate of the upper-left corner of reference patch I_Ref

Y1_Ref: y-coordinate of the upper-left corner of reference patch I_Ref

X2_Ref: x-coordinate of the lower-left corner of reference patch I_Ref

Y2_Ref: y-coordinate of the lower-left corner of reference patch I_Ref

X3_Ref: x-coordinate of the lower-right corner of reference patch I_Ref

Y3_Ref: y-coordinate of the lower-right corner of reference patch I_Ref

X4_Ref: x-coordinate of the upper-right corner of reference patch I_Ref

Y4_Ref: y-coordinate of the upper-right corner of reference patch I_Ref

X1_Trans: x-coordinate of the upper-left corner of transformed patch I_Init

Y1_Trans: y-coordinate of the upper-left corner of transformed patch I_Init

X2_Trans: x-coordinate of the lower-left corner of transformed patch I_Init

Y2_Trans: y-coordinate of the lower-left corner of transformed patch I_Init

X3_Trans: x-coordinate of the lower-right corner of transformed patch I_Init

Y3_Trans: y-coordinate of the lower-right corner of transformed patch I_Init

X4_Trans: x-coordinate of the upper-right corner of transformed patch I_Init

Y4_Trans: y-coordinate of the upper-right corner of transformed patch I_Init

Displacement: mean Euclidean distance between reference corner points and transformed corner points

RelativeDisplacement: the ratio of displacement to the width/height of image patch

Tx: randomly generated translation in the x-direction to synthesise the transformed patch I_Init

Ty: randomly generated translation in the y-direction to synthesise the transformed patch I_Init

AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch I_Init

AngleRad: randomly generated rotation in radian to synthesise the transformed patch I_Init

Naming convention

Aerial Data

zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png

Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.

Cytological data

{{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png

Example: PNT1A_do_1_f15_02_01_T.png indicates the Transformed patch of the 2nd row and 1st column cut from the image with ID PNT1A_do_1_f15.

Histological data

{ID}_{ReferenceOrTransformed}.tif

Example: 1B_A4_T.tif indicates the Transformed patch cut from the image with ID 1B_A4.

This dataset was originally produced by the authors of Is Image-to-Image Translation the Panacea for Multimodal Image Registration? A Comparative Study.
N
Median Household Income Variation by Family Size in Lower Pottsgrove...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Lower Pottsgrove Township, Pennsylvania: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b2271e4-73fd-11ee-949f-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Lower Pottsgrove Township, Pennsylvania
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Lower Pottsgrove Township, Pennsylvania, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Lower Pottsgrove township did not include 7-person households. Across the different household sizes in Lower Pottsgrove township the mean income is $98,879, and the standard deviation is $38,797. The coefficient of variation (CV) is 39.24%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $33,682. It then further increased to $104,545 for 6-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/lower-pottsgrove-township-pa-median-household-income-by-household-size.jpeg" alt="Lower Pottsgrove Township, Pennsylvania median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Lower Pottsgrove township median household income. You can refer the same here
Z
Data from: ReaLSAT, a global dataset of reservoir and lake surface area...
data.niaid.nih.gov
zenodo.org
Updated Feb 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhihao Wei (2023). ReaLSAT, a global dataset of reservoir and lake surface area variations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4118463
Explore at:
Dataset updated
Feb 8, 2023
Dataset provided by
Rahul Ghosh
Vipin Kumar
Hilary Dugan
Paul Hanson
Ankush Khandelwal
Zhihao Wei
Anuj Karpatne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reservoir and Lake Surface Area Timeseries (ReaLSAT) dataset provides an unprecedented reconstruction of surface area variations of lakes and reservoirs at a global scale using Earth Observation (EO) data and novel machine learning techniques. The dataset provides monthly scale surface area variations (1984 to 2020) of 681,137 water bodies below 50°N and sizes greater than 0.1 square kilometers.

The dataset contains the following files:

1) ReaLSAT.zip: A shapefile that contains the reference shape of waterbodies in the dataset.

2) monthly_timeseries.zip: contains one CSV file for each water body. The CSV file provides monthly surface area variation values. The CSV files are stored in a subfolder corresponding to each 10 degree by 10 degree cell. For example, monthly_timeseries_60_-50 folders contain CSV files of lakes that lie between 60 E and 70 E longitude, and 50S and 40 S.

3) monthly_shapes_.zip: contains a geotiff for each water body that lie within the 10 degree by 10 degree cell. Please refer to the visualization notebook on how to use these geotiffs.

4) evaluation_data.zip: contains the random subsets of the dataset used for evaluation. The zip file contains a README file that describes the evaluation data.

6) generate_realsat_timeseries.ipynb: a Google Colab notebook that provides the code to generate timerseries and surface extent maps for any waterbody.

Please refer to the following papers to learn more about the processing pipeline used to create ReaLSAT dataset:

[1] Khandelwal, Ankush, Anuj Karpatne, Praveen Ravirathinam, Rahul Ghosh, Zhihao Wei, Hilary A. Dugan, Paul C. Hanson, and Vipin Kumar. "ReaLSAT, a global dataset of reservoir and lake surface area variations." Scientific data 9, no. 1 (2022): 1-12.

[2] Khandelwal, Ankush. "ORBIT (Ordering Based Information Transfer): A Physics Guided Machine Learning Framework to Monitor the Dynamics of Water Bodies at a Global Scale." (2019).

Version Updates

Version 2.0:

extends the datasets to 2020.

provides geotiffs instead of shapefiles for individual lakes to reduce dataset size.

provides a notebook to visualize the updated dataset.

Version 1.4: added 1120 large lakes to the dataset and removed partial lakes that overlapped with these large lakes.

Version 1.3: fixed visualization related bug in generate_realsat_timeseries.ipynb

Version 1.2: added a Google Colab notebook that provides the code to generate timerseries and surface extent maps for any waterbody in ReaLSAT database.
d
Hydrogeologic Aquifer Test dataset, Lower Mississippi-Gulf Water Science...
catalog.data.gov
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Hydrogeologic Aquifer Test dataset, Lower Mississippi-Gulf Water Science Center, March 2022 [Dataset]. https://catalog.data.gov/dataset/hydrogeologic-aquifer-test-dataset-lower-mississippi-gulf-water-science-center-march-2022
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Since the 1940's, commercial, academic and government hydrologists have used aquifer tests to estimate the hydrogeologic properties of an aquifer near test wells. Results from these tests are recorded in various files, databases, reports, and scientific publications. The Lower Mississippi-Gulf (LMG)-Hydrogeologic Test dataset is an attempt to aggregate these dispersed hydrogeologic test results into a single dataset that is publicly available in a machine-readable format. The hydrogeologic values presented in the Mar2022 version of the LMG-Hydrogeologic Test Dataset were estimated by Douglas Carlson, PhD, with the Louisiana Geological Survey and Associate Professor-Research at Louisiana State University. Hydraulic conductivity estimates were made from specific capacity data using a technique developed by Bradbury and Rothschild (1985). Specific capacity values, from well pumping tests, were obtained from the Louisiana Water Well Registration Database. This Child Item contains the Mar2022 version of the LMG-Hydrogeologic Test dataset with information and results from 7527 aquifer tests. Additionally, this dataset contains 6 attribute tables (.txt files) with additional information for various fields, a zip file containing the geospatial data, a companion attribute table as a .txt file and a readme text file with definitions and descriptions of the attributes and attribute tables. The LMG-Hydrogeologic Aquifer Test dataset - Mar2022 is available in 2 formats: 1) a tab delimited text (.txt) UTF-8 file and 2) an ESRI GIS point shapefile. FIELDS INCLUDED IN THE LMG-HYDROGEOLOGIC TEST DATASET – Mar2022: [a complete list of field names, their definitions and units are listed in the Readme.txt file] Location Data: USGS site identification number, Local identification name, Public Land Survey System Number, Latitude, Longitude, State and County. Well Construction Data: Construction date, well depth, Diameter of well, Diameter of casing, Depth to top of opening (screen) interval, Depth to bottom of opening interval and Length of opening interval. Aquifer Data: Local aquifer name and code, National aquifer name and code, Top of aquifer, Bottom of aquifer, and Thickness of aquifer. Groundwater Test Data: Test date, Yield/discharge, Length of time associated with yield, Static water-level, Production water-level associated with yield, Drawdown associated with yield. Hydrogeologic Data: Specific Capacity, Transmissivity, Horizontal Conductivity, Vertical Conductivity, Permeability and Storage Coefficient. Ancillary Data: Method of Test Analysis and Data Source Reference. DESCRIPTIONS OF ATTACHED FILES: LMG_HydrogeologicTestDataset_Mar2022.txt: is a tab delimited, UTF-8 text file of the LMG-Hydrogeologic Test Dataset Mar2022. Readme.txt: is a text (.txt) file with field names, definitions and units for the LMG-Hydrogeologic Test Dataset Mar2022 and associated attribute tables. AtbtTbl_AqfrCd_Readme.txt: Is an UTF-8 text file containing information from the National Water Information System: Help System web page about USGS groundwater Codes. (accessed December 4, 2019 at https://help.waterdata.usgs.gov/codes-and-parameters) AtbtTbl_FipsGeographyCodes.txt: Is a tab delimited, UTF-8 text file of FIPS (Federal Information Processing Standards) codes, uniquely identifying states, counties and county equivalents in the United States. Note: to reduce the size of this file, City Codes were Removed. (accessed January 8, 2020 at https://www.census.gov/geographies/reference-files/2017/demo/popest/2017-fips.html). AtbtTbl_LocalAqfrCodes.txt: Is a tab delimited, UTF-8 text file of eight-character string identifying an aquifer. Codes are defined by the "Catalog of Aquifer Names and Geologic Unit Codes used by the USGS. (accessed December 4, 2019 at https://help.waterdata.usgs.gov/aqfr_cd) AtbtTbl_NatAqfrCodes.txt: Is a tab delimited, UTF-8 text file of ten-character strings identifying a National aquifer, or principal aquifer of the United States, that are defined as regionally extensive aquifers or aquifer systems that have the potential to be used as a source of potable water. (accessed December 4, 2019 at https://water.usgs.gov/ogw/NatlAqCode-reflist.html) AtbtTbl_TstMthdCodes.txt: Is a tab delimited, UTF-8 text file of codes identifying the test analysis method when reported in the associated reference. AtbtTbl_DataRefNo.txt: Is a tab delimited, UTF-8 text file of references for the source of the associated aquifer test result. CAVEAT: The hydrogeologic test results reported in this dataset have not been through the USGS data review and approval process to receive the Director’s approval. As such, the Mar2022 version of the LMG-Hydrogeologic Test dataset should be considered provisional. Provisional data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. This dataset was developed as part of the U.S. Geological Survey, Mississippi Alluvial Plain Regional Water-Availability Study.
m
DNS Exfiltration Dataset
data.mendeley.com
Updated Jul 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristijan Ziza (2023). DNS Exfiltration Dataset [Dataset]. http://doi.org/10.17632/c4n7fckkz3.3
Explore at:
Unique identifier
https://doi.org/10.17632/c4n7fckkz3.3
Dataset updated
Jul 11, 2023
Authors
Kristijan Ziza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DNS exfiltration dataset was recorded in a realistic network environment. More than 50 million DNS requests were recorded on one of the ISP's DNS servers. The data in the dataset was anonymised by changing all IP addresses using injective mapping. Features in the dataset are split into single request and aggregate features. Single request or DNS label-based features can be calculated for each DNS request independently using only the textual characteristics of the request. On the other hand, aggregate features are calculated using multiple subsequent request from one client to a particular TLD. This reduces the size of the dataset to about 35 million records. The complete list of features with descriptions can be found in dataset_description.txt file. For all of the features which are based on finding English words in the request we used about 60.000 most commom English words. The list of used words can be found in english_words.txt. The main dataset (dataset.csv) contains regular requests and exfiltrations performed using DNSExfiltrator and Iodine tools. Additional dataset (dataset_modified.csv) contains only exfiltrations executed with modified DNSExfiltrator tool. Waiting times between two consecutive requests in this dataset are randomised and the requests also have lower entropy causing the detection to be much harder.

If you use this dataset for your research, please cite: Žiža, K., Tadić, P. & Vuletić, P. DNS exfiltration detection in the presence of adversarial attacks and modified exfiltrator behaviour. Int. J. Inf. Secur. (2023). https://doi.org/10.1007/s10207-023-00723-w
S
Douban Dataset with Multiple Downstream Tasks for Network Embedding
scidb.cn
Updated Oct 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cen Keting; Bingbing Xu; Huawei Shen; Qi Cao (2022). Douban Dataset with Multiple Downstream Tasks for Network Embedding [Dataset]. http://doi.org/10.57760/sciencedb.03252
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.03252
Dataset updated
Oct 10, 2022
Dataset provided by
Science Data Bank
Authors
Cen Keting; Bingbing Xu; Huawei Shen; Qi Cao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Our new network dataset is crawled from Douban Movies{https://movie.douban.com}, which is a website providing users comments on movies. Each node in the network represents a movie, and each edge represents that the movies on both ends of it are co-preferenced by audiences, which is provided by Douban. The network contains 31,761 nodes and 179,924 edges. We use the movie profiles to form the attributes of the node. Firstly, we use ``jieba''{https://github.com/fxsjy/jieba}, a widely used Chinese word segmentation tool, to segment movie profiles and filter common stop words and words that appear less than three times in the corpus. Then, we build a TF-IDF vector for each movie using scikit-learn and reduce the dimension to 500 via SVD. We build three downstream tasks for this Douban dataset, including movie genres prediction, rating score level prediction, and popularity level prediction. Genre predicting task is a multi-classification task, we directly use the genres of the movie provided by Douban as the label and each movie has at least one genre. To build the label of the rating score prediction, we rank movies by rating scores and divide them into 10 classes of the same size. Similarly, we rank all the movies according to the number of comments and divide them into three classes of the same size. For each task, we randomly sample 70\% nodes as the train set, 10\% as the validation set, and the rest as the test set.
Z
The Mountain Habitats Segmentation and Change Detection Dataset
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Starzomski, Brian M. (2020). The Mountain Habitats Segmentation and Change Detection Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12590
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Starzomski, Brian M.
Higgs, Eric
Jean, Frédéric
Branzan Albu, Alexandra
Fisher, Jason T.
Capson, David
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is the dataset presented in the paper The Mountain Habitats Segmentation and Change Detection Dataset accepted for publication in the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Beach, HI, USA, January 6-9, 2015. The full-sized images and masks along with the accompanying files and results can be downloaded here. The size of the dataset is about 2.1 GB.

The dataset is released under the Creative Commons Attribution-Non Commercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/legalcode).

The dataset documentation is hosted on GitHub at the following address: http://github.com/fjean/mhscd-dataset-doc. Direct download links to the latest revision of the documentation are provided below:

PDF format: http://github.com/fjean/mhscd-dataset-doc/raw/master/mhscd-dataset-doc.pdf

Text format: http://github.com/fjean/mhscd-dataset-doc/raw/master/mhscd-dataset-doc.rst
h
MIT_environmental_impulse_responses
huggingface.co
opendatalab.com
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Scripka (2023). MIT_environmental_impulse_responses [Dataset]. https://huggingface.co/datasets/davidscripka/MIT_environmental_impulse_responses
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2023
Authors
David Scripka
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
MIT Environmental Impulse Response Dataset The audio recordings in this dataset are originally created by the Computational Audition Lab at MIT. The source of the data can be found at: https://mcdermottlab.mit.edu/Reverb/IR_Survey.html. The audio files in the dataset have been resampled to a sampling rate of 16 kHz. This resampling was done to reduce the size of the dataset while making it more suitable for various tasks, including data augmentation. The dataset consists of 271 audio files… See the full description on the dataset page: https://huggingface.co/datasets/davidscripka/MIT_environmental_impulse_responses.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical Distribution [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/on-tracer-breakthrough-curve-dataset-size-shape-and-statistical-distribution

Data from: On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical Distribution

Explore at:

Dataset updated

Dec 14, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

A tracer breakthrough curve (BTC) for each sampling station is the ultimate goal of every quantitative hydrologic tracing study, and dataset size can critically affect the BTC. Groundwater-tracing data obtained using in situ automatic sampling or detection devices may result in very high-density data sets. Data-dense tracer BTCs obtained using in situ devices and stored in dataloggers can result in visually cluttered overlapping data points. The relatively large amounts of data detected by high-frequency settings available on in situ devices and stored in dataloggers ensure that important tracer BTC features, such as data peaks, are not missed. Alternatively, such dense datasets can also be difficult to interpret. Even more difficult, is the application of such dense data sets in solute-transport models that may not be able to adequately reproduce tracer BTC shapes due to the overwhelming mass of data. One solution to the difficulties associated with analyzing, interpreting, and modeling dense data sets is the selective removal of blocks of the data from the total dataset. Although it is possible to arrange to skip blocks of tracer BTC data in a periodic sense (data decimation) so as to lessen the size and density of the dataset, skipping or deleting blocks of data also may result in missing the important features that the high-frequency detection setting efforts were intended to detect. Rather than removing, reducing, or reformulating data overlap, signal filtering and smoothing may be utilized but smoothing errors (e.g., averaging errors, outliers, and potential time shifts) need to be considered. Appropriate probability distributions to tracer BTCs may be used to describe typical tracer BTC shapes, which usually include long tails. Recognizing appropriate probability distributions applicable to tracer BTCs can help in understanding some aspects of the tracer migration. This dataset is associated with the following publications: Field, M. Tracer-Test Results for the Central Chemical Superfund Site, Hagerstown, Md. May 2014 -- December 2015. U.S. Environmental Protection Agency, Washington, DC, USA, 2017. Field, M. On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical Distribution. ADVANCES IN WATER RESOURCES. Elsevier Science Ltd, New York, NY, USA, 141: 1-19, (2020).

Clear search

Close search

Google apps

Main menu

Data from: On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical...

Median Household Income Variation by Family Size in Lower Frederick...

About this dataset

Content

Inspiration

Recommended for further research

Does repeatedly viewing overweight versus underweight images change...

Dataset for: Temperature-dependent body size effects determine population...

Olympus VSI "multifile" test dataset

AI Training Dataset Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Hydrogeologic Aquifer Test dataset, Lower Mississipp-Gulf Water Science...

database for Policy Decision making for Future climate change (dynamical...

Data from: Size does matter: overcoming the adeno-associated virus packaging...

Data from: S8 Fig -

Median Household Income Variation by Family Size in Lower Heidelberg...

About this dataset

Content

Inspiration

Recommended for further research

Data from: Insect size responses to climate change vary across elevations...

Data from: Insect size responses to climate changes vary across elevations according to seasonal timing

Description of the data and file structure

Datasets for Evaluation of Multimodal Image Registration

Median Household Income Variation by Family Size in Lower Pottsgrove...

About this dataset

Content

Inspiration

Recommended for further research

Data from: ReaLSAT, a global dataset of reservoir and lake surface area...

Hydrogeologic Aquifer Test dataset, Lower Mississippi-Gulf Water Science...

DNS Exfiltration Dataset

Douban Dataset with Multiple Downstream Tasks for Network Embedding

The Mountain Habitats Segmentation and Change Detection Dataset

MIT_environmental_impulse_responses

Data from: On Tracer Breakthrough Curve Dataset Size, Shape, and Statistical Distribution