71 datasets found

d
NCOM Region 10 Aggregation/Best Time Series
datadiscoverystudio.org
opendap
Updated Nov 21, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil (2018). NCOM Region 10 Aggregation/Best Time Series [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/5cdc21deb99c4b25bb51704b576e14c6/html
Explore at:
opendapAvailable download formats
Dataset updated
Nov 21, 2018
Authors
kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil
Area covered

Description
Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.
Envestnet | Yodlee's De-Identified Online Purchase Data | Row/Aggregate...
datarade.ai
.sql, .txt
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Envestnet | Yodlee, Envestnet | Yodlee's De-Identified Online Purchase Data | Row/Aggregate Level | USA Consumer Data covering 3600+ corporations | 90M+ Accounts [Dataset]. https://datarade.ai/data-products/envestnet-yodlee-s-de-identified-online-purchase-data-row-envestnet-yodlee
Explore at:
.sql, .txtAvailable download formats
Dataset provided by
Yodlee
Envestnethttp://envestnet.com/
Authors
Envestnet | Yodlee
Area covered
United States of America
Description
Envestnet®| Yodlee®'s Online Purchase Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.

Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.

We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.

Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?

Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.

Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking

Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)

Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence

Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis
d
HYCOM Surface Aggregation/Best Time Series
datadiscoverystudio.org
opendap
Updated Nov 21, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil (2018). HYCOM Surface Aggregation/Best Time Series [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/6df5daf0d39f4bb690e01fc65f9f1e08/html
Explore at:
opendapAvailable download formats
Dataset updated
Nov 21, 2018
Authors
kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil
Area covered
North Atlantic Ocean, Atlantic Ocean
Description
Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.
Data from: Usable observations over Europe: Evaluation of compositing...
zenodo.org
data.niaid.nih.gov
+1more
bin
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katarzyna Ewa Lewińska; Katarzyna Ewa Lewińska; David Frantz; Ulf Leser; Patrick Hostert; David Frantz; Ulf Leser; Patrick Hostert (2024). Usable observations over Europe: Evaluation of compositing windows for landsat and sentinel-2 time series [Dataset]. http://doi.org/10.5061/dryad.5tb2rbp94
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.5tb2rbp94
Dataset updated
Jul 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katarzyna Ewa Lewińska; Katarzyna Ewa Lewińska; David Frantz; Ulf Leser; Patrick Hostert; David Frantz; Ulf Leser; Patrick Hostert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Measurement technique
We used all Landsat surface reflectance Level 2, Tier 1 (Collection 2) scenes from 1984 through 2021 and Sentinel-2 TOA reflectance Level-1C (pre‑Collection-1; European Space Agency, 2021) scenes from 2016 through 2021 acquired over Europe, as available in Google Earth Engine (data accessed in June 2022; Gorelick et al., 2017). We utilized Seninel-2 Level-1C data instead of Level-2A because the Level-2A inherent quality data lack the desired scope and accuracy (Baetens et al., 2019; Coluzzi et al., 2018). Yet, the Level-1C products are accompanied by cloud probabilities (Zupanc, 2017) facilitating improved cloud screening. Furthermore, for cloud screening we also used Band 10 (Cirrus), which is not available as Level‑2A. Because we performed data availability analyses, i.e., we tallied the daily presence/absence of usable observation, the disparity between Landsat and Sentinel-2 reflectance values was here irrelevant, and the intra-sensor normalization was not needed. The difference in processing levels, however, played out in cloud, shadow, and snow masking accuracy, where the Sentinel-2 workflow assembles several approaches with known accuracies (Skakun et al., 2022), but has not been evaluated as a whole. We acknowledge that for real-life reflectance-based applications, data from corresponding processing levels need to be used and the reflectance normalized among the sensors (Okujeni et al., 2024). We recommend thus either preprocessing of the Sentinel-2 TOA data to achieve the desired quality of masks, or linking Sentinel-2 Level-A2 data with Level-1C band 10 and relevant Cloud Probability scenes for more rigorous cloud screening. To ensure that only pixels with the highest quality entered the analysis we applied conservative pixel-quality screening. For Landsat scenes, we excluded all pixels flagged as cloud, shadow, or snow using the inherent pixel quality bands (Foga et al., 2017; Z. Zhu & Woodcock, 2012) and discarded saturated pixels (Zhang et al., 2022). We further used the quality bands to exclude all data gaps in the Landsat 7 acquisitions occurring due to the SLC scanline failure (Andréfouët et al., 2003). Although the accuracy of the inherent pixel‑quality bands differs among the Landsat sensors due to the differences in the sensor's build and thus availability of thermal and cirrus‑specific bands (Foga et al., 2017), the Landsat quality bands are acclaimed standardized quality product. Finally, owing to Landsat 7's orbit drift (Qiu et al., 2021), we excluded all ETM+ scenes acquired after 31st December 2020. We used a 20‑km grid of 16,642 equidistant points to analyze the availability of useable Landsat and Sentinel-2 observations over Europe. We distributed points according to the Lambert azimuthal equal-area projection (LAEA, EPSG:3035), which is the preferred projection for EU-wide products. Despite LAEA being the equal-area projection, the distance distortion within our study area was mostly below 10 m, which is less than one pixel in high‑resolution Sentinel-2 bands. The systematic gridded sampling design ensured good representation of the West-East and South‑North climatic and phenological gradients, and facilitated graphical presentation of results. We derived the time series of usable Landsat and Sentinel-2 observations over Europe sampling individual pixels spaced systematically every 20 km in the latitudinal and longitudinal directions. We identified sampling locations according to the Lambert azimuthal equal-area projection (LAEA, EPSG:3035), which is the preferred projection for EU‑wide products. Despite LAEA being the equal-area projection, the distance distortion within our study area was mostly below 10 m, which is less than one pixel in high‑resolution Sentinel-2 bands. The systematic point sampling design is used to derive overview statistics for big datasets and in nearest neighbor-based rescaling of rasters. The 20-km sampling interval resulted in 16,642 locations over land ensuring good representation of the West-East and South‑North climatic and phenological gradients, as well as facilitating graphical presentation of results. For each sampled pixel we recorded the date of the valid cloud-, shadow, and snow-free Landsat and Sentinel-2 acquisition. We used the information at the original resolution and assumed each sampled pixel to be a probabilistic sample of the surrounding 20x20-km area, making the process analogous to the nearest neighbor resampling. We excluded duplicated data entries coming from the vertical overlaps among Landsat tiles in the same row, and vertical and horizontal overlaps among Sentinel-2 granules from the same swath. This resulted in daily data availability for 1984-2021 (1 – valid observation; 0 – no data or no valid observation), which we used to derive availability information for composites with aggregation periods of five, 10, 15, 20, and 25 days; one, two, three, four, six and 12 months. The non-overlapping compositing windows compartmentalized daily information for each year into 73, 37, 24, 18, 15, 12, six, four, three, two, and one composites for each calendar year, respectively. We used January 1st as the starting date for the compositing window sequence for each year. When the last compositing window was shorter than half its window width, we merged it with the penultimate composite. For each data point and every considered aggregation period we recorded the amount of available observations and considered a composite as 'successful' if at least one valid observation was available.
Description
Landsat and Sentinel-2 data archives provide ever-increasing amounts of satellite data. However, the availability of usable observations greatly varies spatially and temporally. Pixel-based compositing that generates temporally equidistant cloud-free synthetic images can mitigate temporal variability, by constructing uninterrupted time series using different compositing windows. Here, we evaluated the feasibility of using compositing windows ranging from five days to one year for 1984-2021 Landsat and 2015-2021 Sentinel 2 time series to derive uninterrupted time series across Europe. We considered separate and joint use of both data archives and analyzed the spatio-temporal availability of composites during each calendar year and pixel-specific growing season across a variety of time windows and hypothesizing data interpolation. Our results demonstrated opportunities and limitations in the available data records to support medium- and long-term analyses requiring uninterrupted time series of composites with sub-annual temporal resolution. Spatial disparities across different compositing windows provide guidance on the feasibility of workflows relying on different data densities and on the challenges in wall-to-wall analyses. The feasibility of consistent time series based on composites with sub-monthly aggregation periods was mostly limited to the combined Landsat and Sentinel-2 archives after 2015, yet in some geographies requires interpolation of up to 50% of data.
Z
CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly...
data.niaid.nih.gov
zenodo.org
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hynek, Karel (2024). CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13382426
Explore at:
Dataset updated
Sep 30, 2024
Dataset provided by
Hynek, Karel
Čejka, Tomáš
Šiška, Pavel
Koumar, Josef
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CESNET-TimeSeries24: The dataset for network traffic forecasting and anomaly detection

The dataset called CESNET-TimeSeries24 was collected by long-term monitoring of selected statistical metrics for 40 weeks for each IP address on the ISP network CESNET3 (Czech Education and Science Network). The dataset encompasses network traffic from more than 275,000 active IP addresses, assigned to a wide variety of devices, including office computers, NATs, servers, WiFi routers, honeypots, and video-game consoles found in dormitories. Moreover, the dataset is also rich in network anomaly types since it contains all types of anomalies, ensuring a comprehensive evaluation of anomaly detection methods.Last but not least, the CESNET-TimeSeries24 dataset provides traffic time series on institutional and IP subnet levels to cover all possible anomaly detection or forecasting scopes. Overall, the time series dataset was created from the 66 billion IP flows that contain 4 trillion packets that carry approximately 3.7 petabytes of data. The CESNET-TimeSeries24 dataset is a complex real-world dataset that will finally bring insights into the evaluation of forecasting models in real-world environments.

Please cite the usage of our dataset as:

Josef Koumar, Karel Hynek, Tomáš Čejka, Pavel Šiška, "CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting", arXiv e-prints (2024): https://doi.org/10.48550/arXiv.2409.18874 @misc{koumar2024cesnettimeseries24timeseriesdataset, title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting}, author={Josef Koumar and Karel Hynek and Tomáš Čejka and Pavel Šiška}, year={2024}, eprint={2409.18874}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2409.18874}, }

Time series

We create evenly spaced time series for each IP address by aggregating IP flow records into time series datapoints. The created datapoints represent the behavior of IP addresses within a defined time window of 10 minutes. The vector of time-series metrics v_{ip, i} describes the IP address ip in the i-th time window. Thus, IP flows for vector v_{ip, i} are captured in time windows starting at t_i and ending at t_{i+1}. The time series are built from these datapoints.

Datapoints created by the aggregation of IP flows contain the following time-series metrics:

Simple volumetric metrics: the number of IP flows, the number of packets, and the transmitted data size (i.e. number of bytes)

Unique volumetric metrics: the number of unique destination IP addresses, the number of unique destination Autonomous System Numbers (ASNs), and the number of unique destination transport layer ports. The aggregation of \textit{Unique volumetric metrics} is memory intensive since all unique values must be stored in an array. We used a server with 41 GB of RAM, which was enough for 10-minute aggregation on the ISP network.

Ratios metrics: the ratio of UDP/TCP packets, the ratio of UDP/TCP transmitted data size, the direction ratio of packets, and the direction ratio of transmitted data size

Average metrics: the average flow duration, and the average Time To Live (TTL)

Multiple time aggregation: The original datapoints in the dataset are aggregated by 10 minutes of network traffic. The size of the aggregation interval influences anomaly detection procedures, mainly the training speed of the detection model. However, the 10-minute intervals can be too short for longitudinal anomaly detection methods. Therefore, we added two more aggregation intervals to the datasets--1 hour and 1 day.

Time series of institutions: We identify 283 institutions inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution's data.

Time series of institutional subnets: We identify 548 institution subnets inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution subnet's data.

Data Records

The file hierarchy is described below:

cesnet-timeseries24/

|- institution_subnets/ | |- agg_10_minutes/<id_institution>.csv | |- agg_1_hour/<id_institution>.csv | |- agg_1_day/<id_institution>.csv | |- identifiers.csv |- institutions/ | |- agg_10_minutes/<id_institution_subnet>.csv | |- agg_1_hour/<id_institution_subnet>.csv | |- agg_1_day/<id_institution_subnet>.csv | |- identifiers.csv |- ip_addresses_full/ | |- agg_10_minutes/<id_ip_folder>/<id_ip>.csv | |- agg_1_hour/<id_ip_folder>/<id_ip>.csv | |- agg_1_day/<id_ip_folder>/<id_ip>.csv | |- identifiers.csv |- ip_addresses_sample/ | |- agg_10_minutes/<id_ip>.csv | |- agg_1_hour/<id_ip>.csv | |- agg_1_day/<id_ip>.csv | |- identifiers.csv |- times/ | |- times_10_minutes.csv | |- times_1_hour.csv | |- times_1_day.csv |- ids_relationship.csv |- weekends_and_holidays.csv

The following list describes time series data fields in CSV files:

id_time: Unique identifier for each aggregation interval within the time series, used to segment the dataset into specific time periods for analysis.

n_flows: Total number of flows observed in the aggregation interval, indicating the volume of distinct sessions or connections for the IP address.

n_packets: Total number of packets transmitted during the aggregation interval, reflecting the packet-level traffic volume for the IP address.

n_bytes: Total number of bytes transmitted during the aggregation interval, representing the data volume for the IP address.

n_dest_ip: Number of unique destination IP addresses contacted by the IP address during the aggregation interval, showing the diversity of endpoints reached.

n_dest_asn: Number of unique destination Autonomous System Numbers (ASNs) contacted by the IP address during the aggregation interval, indicating the diversity of networks reached.

n_dest_port: Number of unique destination transport layer ports contacted by the IP address during the aggregation interval, representing the variety of services accessed.

tcp_udp_ratio_packets: Ratio of packets sent using TCP versus UDP by the IP address during the aggregation interval, providing insight into the transport protocol usage pattern. This metric belongs to the interval <0, 1> where 1 is when all packets are sent over TCP, and 0 is when all packets are sent over UDP.

tcp_udp_ratio_bytes: Ratio of bytes sent using TCP versus UDP by the IP address during the aggregation interval, highlighting the data volume distribution between protocols. This metric belongs to the interval <0, 1> with same rule as tcp_udp_ratio_packets.

dir_ratio_packets: Ratio of packet directions (inbound versus outbound) for the IP address during the aggregation interval, indicating the balance of traffic flow directions. This metric belongs to the interval <0, 1>, where 1 is when all packets are sent in the outgoing direction from the monitored IP address, and 0 is when all packets are sent in the incoming direction to the monitored IP address.

dir_ratio_bytes: Ratio of byte directions (inbound versus outbound) for the IP address during the aggregation interval, showing the data volume distribution in traffic flows. This metric belongs to the interval <0, 1> with the same rule as dir_ratio_packets.

avg_duration: Average duration of IP flows for the IP address during the aggregation interval, measuring the typical session length.

avg_ttl: Average Time To Live (TTL) of IP flows for the IP address during the aggregation interval, providing insight into the lifespan of packets.

Moreover, the time series created by re-aggregation contains following time series metrics instead of n_dest_ip, n_dest_asn, and n_dest_port:

sum_n_dest_ip: Sum of numbers of unique destination IP addresses.

avg_n_dest_ip: The average number of unique destination IP addresses.

std_n_dest_ip: Standard deviation of numbers of unique destination IP addresses.

sum_n_dest_asn: Sum of numbers of unique destination ASNs.

avg_n_dest_asn: The average number of unique destination ASNs.

std_n_dest_asn: Standard deviation of numbers of unique destination ASNs)

sum_n_dest_port: Sum of numbers of unique destination transport layer ports.

avg_n_dest_port: The average number of unique destination transport layer ports.

std_n_dest_port: Standard deviation of numbers of unique destination transport layer ports.

Moreover, files identifiers.csv in each dataset type contain IDs of time series that are present in the dataset. Furthermore, the ids_relationship.csv file contains a relationship between IP addresses, Institutions, and institution subnets. The weekends_and_holidays.csv contains information about the non-working days in the Czech Republic.
f
Data from: A consistent data model for different data granularity in control...
figshare.com
tandf.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott D. Grimshaw (2023). A consistent data model for different data granularity in control charts [Dataset]. http://doi.org/10.6084/m9.figshare.19829476.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19829476.v1
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Scott D. Grimshaw
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
After a long-running show was canceled, control charts are used to identify if and when viewing drops. The finest granularity daily viewing has high autocorrelation and control charts use residuals from a seasonal ARIMA model. For coarse granularity data (weekly and monthly viewing) an approximate AR model is derived to be consistent with the finest granularity model. With the proposed approach, a longer memory model is used in the granular data control charts that reduces the number of false alarms from control charts constructed treating granular data as a different measurement.
d
NCOM SFC8 Hindcast Aggregation/Best Time Series
datadiscoverystudio.org
opendap
Updated May 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USA/NAVY/NAVO; USA/NAVY/NAVO; USA/NAVY/NAVO; USA/NAVY/NAVO (2018). NCOM SFC8 Hindcast Aggregation/Best Time Series [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/128083e1df8b454280d17ca40a93d412/html
Explore at:
opendapAvailable download formats
Dataset updated
May 4, 2018
Authors
USA/NAVY/NAVO; USA/NAVY/NAVO; USA/NAVY/NAVO; USA/NAVY/NAVO
Area covered
Africa
Description
Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.
Data from: Using partial aggregation in Spatial Capture Recapture
zenodo.org
data.niaid.nih.gov
+1more
bin
Updated May 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cyril Milleret; Pierre Dupont; Henrik Brøseth; Jonas Kindberg; J. Andrew Royle; Richard Bischof; Cyril Milleret; Pierre Dupont; Henrik Brøseth; Jonas Kindberg; J. Andrew Royle; Richard Bischof (2022). Data from: Using partial aggregation in Spatial Capture Recapture [Dataset]. http://doi.org/10.5061/dryad.pd612qp
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.pd612qp
Dataset updated
May 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cyril Milleret; Pierre Dupont; Henrik Brøseth; Jonas Kindberg; J. Andrew Royle; Richard Bischof; Cyril Milleret; Pierre Dupont; Henrik Brøseth; Jonas Kindberg; J. Andrew Royle; Richard Bischof
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Spatial capture-recapture (SCR) models are commonly used for analyzing data collected using non-invasive genetic sampling (NGS). Opportunistic NGS often leads to detections that do not occur at discrete detector locations. Therefore, spatial aggregation of individual detections into fixed detectors (e.g. center of grid cells) is an option to increase computing speed of SCR analyses. However, it may reduce precision and accuracy of parameter estimations.

Using simulations, we explored the impact that spatial aggregation of detections has on a trade-off between computing time and parameter precision and bias, under a range of biological conditions. We used three different observation models: the commonly used Poisson and Bernoulli models, as well as a novel way to partially aggregate detections (Partially Aggregated Binary model (PAB)) to reduce the loss of information after aggregating binary detections. The PAB model divides detectors into K subdetectors and models the frequency of subdetectors with more than one detection as a binomial response with a sample size of K. Finally, we demonstrate the consequences of aggregation and the use of the PAB model using NGS data from the monitoring of wolverine (Gulo gulo) in Norway.

Spatial aggregation of detections, while reducing computation time, does indeed incur costs in terms of reduced precision and accuracy, especially for the parameters of the detection function. SCR models estimated abundance with a low bias (< 10%) even at high degree of aggregation, but only for the Poisson and PAB models. Overall, the cost of aggregation is mitigated when using the Poisson and PAB models. At the same level of aggregation, the PAB observation models out-performs the Bernoulli model in terms of accuracy of estimates, while offering the benefits of a binary observation model (less assumptions about the underlying ecological process) over the count-based model.

We recommend that detector spacing after aggregation does not exceed 1.5 times the scale-parameter of the detection function in order to limit bias. We recommend the use of the PAB observation model when performing spatial aggregation of binary data as it can mitigate the cost of aggregation, compared to the Bernoulli model.
Monthly aggregated GLASS FAPAR V6 (250 m): 50th percentile monthly...
zenodo.org
png, tiff
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu-Feng Ho; Yu-Feng Ho; Xuemeng Tian; Xuemeng Tian; Davide Consoli; Davide Consoli; Julia Hackländer; Julia Hackländer; Tomislav Hengl; Tomislav Hengl (2024). Monthly aggregated GLASS FAPAR V6 (250 m): 50th percentile monthly time-series (2009) [Dataset]. http://doi.org/10.5281/zenodo.8417513
Explore at:
tiff, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8417513
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yu-Feng Ho; Yu-Feng Ho; Xuemeng Tian; Xuemeng Tian; Davide Consoli; Davide Consoli; Julia Hackländer; Julia Hackländer; Tomislav Hengl; Tomislav Hengl
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Mar 1, 2000 - Dec 31, 2021
Description
List of Subdatasets:

Long-term data: 2000-2021

5th percentile (p05) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021

50th percentile (p50) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021

95th percentile (p95) monthly time-series: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021

General Description

The monthly aggregated Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) dataset is derived from 250m 8d GLASS V6 FAPAR. The data set is derived from Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance and LAI data using several other FAPAR products (MODIS Collection 6, GLASS FAPAR V5, and PROBA-V1 FAPAR) to generate a bidirectional long-short-term memory (Bi-LSTM) model to estimate FAPAR. The dataset time spans from March 2000 to December 2021 and provides data that covers the entire globe. The dataset can be used in many applications like land degradation modeling, land productivity mapping, and land potential mapping. The dataset includes:

Long-term:

Derived from monthly time-series. This dataset provides linear trend model for the p95 variable: (1) slope beta mean (p95.beta_m), p-value for beta (p95.beta_pv), intercept alpha mean (p95.alpha_m), p-value for alpha (p95.alpha_pv), and coefficient of determination R² (p95.r2_m).

Monthly time-series:

Monthly aggregation with three standard statistics: (1) 5th percentile (p05), median (p50), and 95th percentile (p95). For each month, we aggregate all composites within that month plus one composite each before and after, ending up with 5 to 6 composites for a single month depending on the number of images within that month.

Data Details

Time period: March 2000 – December 2021

Type of data: Fraction of Absorbed Photosynthetically Active Radiation (FAPAR)

How the data was collected or derived: Derived from 250m 8 d GLASS V6 FAPAR using Python running in a local HPC. The time-series analysis were computed using the Scikit-map Python package.

Statistical methods used: for the long-term, Ordinary Least Square (OLS) of p95 monthly variable; for the monthly time-series, percentiles 05, 50, and 95.

Limitations or exclusions in the data: The dataset does not include data for Antarctica.

Coordinate reference system: EPSG:4326

Bounding box (Xmin, Ymin, Xmax, Ymax): (-180.00000, -62.0008094, 179.9999424, 87.37000)

Spatial resolution: 1/480 d.d. = 0.00208333 (250m)

Image size: 172,800 x 71,698

File format: Cloud Optimized Geotiff (COG) format.

Support

If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue: https://github.com/Open-Earth-Monitor/Global_FAPAR_250m/issues

Reference

Hackländer, J., Parente, L., Ho, Y.-F., Hengl, T., Simoes, R., Consoli, D., Şahin, M., Tian, X., Herold, M., Jung, M., Duveiller, G., Weynants, M., Wheeler, I., (2023?) "Land potential assessment and trend-analysis using 2000–2021 FAPAR monthly time-series at 250 m spatial resolution", submitted to PeerJ, preprint available at: https://doi.org/10.21203/rs.3.rs-3415685/v1

Name convention

To ensure consistency and ease of use across and within the projects, we follow the standard Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describes important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are:

generic variable name: fapar = Fraction of Absorbed Photosynthetically Active Radiation

variable procedure combination: essd.lstm = Earth System Science Data with bidirectional long short-term memory (Bi–LSTM)

Position in the probability distribution / variable type: p05/p50/p95 = 5th/50th/95th percentile

Spatial support: 250m

Depth reference: s = surface

Time reference begin time: 20000301 = 2000-03-01

Time reference end time: 20211231 = 2022-12-31

Bounding box: go = global (without Antarctica)

EPSG code: epsg.4326 = EPSG:4326

Version code: v20230628 = 2023-06-28 (creation date)
A
Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female
ceicdata.com
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female [Dataset]. https://www.ceicdata.com/en/australia/aggregate-monthly-hours-worked/aggregate-monthly-hours-worked-trend-part-time-female
Explore at:
Dataset updated
Mar 19, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2024 - Jan 1, 2025
Area covered
Australia
Variables measured
Hours Worked
Description
Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female data was reported at 225,972.411 Hour th in Jan 2025. This records an increase from the previous number of 225,859.486 Hour th for Dec 2024. Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female data is updated monthly, averaging 123,113.874 Hour th from Jul 1978 (Median) to Jan 2025, with 559 observations. The data reached an all-time high of 225,972.411 Hour th in Jan 2025 and a record low of 46,197.368 Hour th in Jul 1978. Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female data remains active status in CEIC and is reported by Australian Bureau of Statistics. The data is categorized under Global Database’s Australia – Table AU.G052: Aggregate Monthly Hours Worked.
H
Script for aggregating Norfolk, VA environmental data to daily time scale
hydroshare.org
search.dataone.org
zip
Updated Mar 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeff Sadler (2018). Script for aggregating Norfolk, VA environmental data to daily time scale [Dataset]. http://doi.org/10.4211/hs.41c8d8f8788c4ba0b0bfbb924fe1d403
Explore at:
zip(145.7 KB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.41c8d8f8788c4ba0b0bfbb924fe1d403
Dataset updated
Mar 1, 2018
Dataset provided by
HydroShare
Authors
Jeff Sadler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2010 - Nov 1, 2016
Area covered

Description
Script and accompanying ipython notebook written in Python 2.7 for aggregating sub-daily environmental data (rainfall, tide, wind, groundwater) to a daily timescale. The input data are from Norfolk, Virginia. Several different methods of aggregation are used including averages and maximums. The processed/aggregated data are combined with street flood report data to be used in data-driven, predictive modeling. The script in this resource was used in the analysis described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
I
India CS: Aggregate Deposits of Residents: Time
ceicdata.com
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
India CS: Aggregate Deposits of Residents: Time [Dataset]. https://www.ceicdata.com/en/india/commercial-bank-survey/cs-aggregate-deposits-of-residents-time
Explore at:
Dataset updated
Mar 26, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 1, 2017 - Sep 1, 2018
Area covered
India
Variables measured
Loans
Description
India CS: Aggregate Deposits of Residents: Time data was reported at 103,241,410.000 INR mn in Sep 2018. This records an increase from the previous number of 102,539,530.000 INR mn for Aug 2018. India CS: Aggregate Deposits of Residents: Time data is updated monthly, averaging 30,494,680.000 INR mn from Mar 1999 (Median) to Sep 2018, with 235 observations. The data reached an all-time high of 103,241,410.000 INR mn in Sep 2018 and a record low of 5,454,360.000 INR mn in Mar 1999. India CS: Aggregate Deposits of Residents: Time data remains active status in CEIC and is reported by Reserve Bank of India. The data is categorized under Global Database’s India – Table IN.KAC003: Commercial Bank Survey.
d
On the Stability of the Excess Sensitivity of Aggregate Consumption Growth...
b2find.dkrz.de
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). On the Stability of the Excess Sensitivity of Aggregate Consumption Growth in the USA (replication data) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/59734c18-eb3f-5cb2-b52e-d42d14b0d180
Explore at:
Dataset updated
Oct 24, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This paper investigates whether there is time variation in the excess sensitivity of aggregate consumption growth to anticipated aggregate disposable income growth using quarterly US data over the period 1953-2014. Our empirical framework contains the possibility of stickiness in aggregate consumption growth and takes into account measurement error and time aggregation. Our empirical specification is cast into a Bayesian state-space model and estimated using Markov chain Monte Carlo (MCMC) methods. We use a Bayesian model selection approach to deal with the non-regular test for the null hypothesis of no time variation in the excess sensitivity parameter. Anticipated disposable income growth is calculated by incorporating an instrumental variables estimation approach into our MCMC algorithm. Our results suggest that the excess sensitivity parameter in the USA is stable at around 0.23 over the entire sample period.
E
WWF Italy (aggregated per 1-degree cell)
erddap.eurobis.org
obis.org
+2more
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casale (2025). WWF Italy (aggregated per 1-degree cell) [Dataset]. https://erddap.eurobis.org/erddap/info/zd_1826_1deg/index.html
Explore at:
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Casale
Area covered

Variables measured
sex, time, Notes, aphia_id, latitude, TimeOfDay, lifestage, longitude, DayCollected, BasisOfRecord, and 5 more
Description
Original provider: Paolo Casale Dataset credits: Data provider WWF Italy's Sea Turtle Network Originating data center Satellite Tracking and Analysis Tool (STAT) Supplemental information: Visit STAT's project page for additional information. This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. AccConID=24 AccConstrDescription=This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms AccConstrDisplay=This dataset is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. AccConstrEN=Attribution-NonCommercial (CC BY-NC) AccessConstraint=Attribution-NonCommercial (CC BY-NC) AccessConstraints=This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. Acronym=None added_date=2024-06-04 11:58:39.543000 BrackishFlag=0 CDate=2023-04-12 cdm_data_type=Other CheckedFlag=0 Citation=Casale P. 2021. WWF Italy. Data originated from Satellite Tracking and Analysis Tool (STAT; http://www.seaturtle.org/tracking/index.shtml?project_id=184). Comments=None ContactEmail=paolo.casale1@gmail.com Conventions=COARDS, CF-1.6, ACDD-1.3 CurrencyDate=None DasID=8288 DasOrigin=None DasType=None DasTypeID=None DateLastModified={'date': '2025-02-18 01:34:01.301036', 'timezone_type': 1, 'timezone': '+01:00'} DescrCompFlag=0 DescrTransFlag=0 Easternmost_Easting=49.5 EmbargoDate=None EngAbstract=Original provider: Paolo Casale Dataset credits: Data provider WWF Italy's Sea Turtle Network Originating data center Satellite Tracking and Analysis Tool (STAT) Supplemental information: Visit STAT's project page for additional information. This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. EngDescr=None FreshFlag=0 GBIF_UUID=5e413639-a91c-41ba-aa33-8583c479a3fa geospatial_lat_max=47.5 geospatial_lat_min=25.5 geospatial_lat_units=degrees_north geospatial_lon_max=49.5 geospatial_lon_min=-53.5 geospatial_lon_units=degrees_east infoUrl=None InputNotes=None institution=WWF License=https://creativecommons.org/licenses/by-nc/4.0 Lineage=None MarineFlag=1 modified_sync=2024-05-21 00:00:00 Northernmost_Northing=47.5 OrigAbstract=None OrigDescr=None OrigDescrLang=None OrigDescrLangNL=None OrigLangCode=None OrigLangCodeExtended=None OrigLangID=None OrigTitle=None OrigTitleLang=None OrigTitleLangCode=None OrigTitleLangID=None OrigTitleLangNL=None Progress=None PublicFlag=1 ReleaseDate=Apr 24 2021 12:00AM ReleaseDate0=2021-04-24 RevisionDate=None SizeReference=None sourceUrl=(local files) Southernmost_Northing=25.5 standard_name_vocabulary=CF Standard Name Table v70 StandardTitle=WWF Italy (aggregated per 1-degree cell) StatusID=1 subsetVariables=ScientificName,BasisOfRecord,YearCollected,MonthCollected,DayCollected,sex,lifestage,aphia_id TerrestrialFlag=0 UDate=2023-04-20 VersionDate=Apr 24 2021 12:00AM VersionDay=None VersionMonth=None VersionName=None VersionYear=None VlizCoreFlag=1 Westernmost_Easting=-53.5
E
NMME CCSM4 Pressure at Sea Level Daily Aggregation R01 PSL By time,...
ncei.noaa.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NMME CCSM4 Pressure at Sea Level Daily Aggregation R01 PSL By time, latitude, longitude [Dataset]. https://www.ncei.noaa.gov/erddap/info/nmme_ccsm4_psl_day_r01_by_time_LAT_LON/index.html
Explore at:
Time period covered
Jan 1, 2018 - Feb 28, 2026
Area covered

Variables measured
PSL, time, latitude, longitude
Description
NMME CCSM4 Pressure at Sea Level Daily Aggregation R01 PSL Dimensioned By time, latitude, longitude. _CoordSysBuilder=ucar.nc2.dataset.conv.CF1Convention cdm_data_type=Grid contact=Dughong Min (dmin@rsmas.miami.edu) and Ben Kirtman (bkirtman@rsmas.miami.edu) Conventions=CF-1.4 Easternmost_Easting=359.0 endmonth=01 endyear=2026 experiment=Febuary 2025 Forecast experiment_id=Mon Mar 10 12:06:20 PM EDT 2025 frequency=day Generator=NCL v.6.0 geospatial_lat_max=90.0 geospatial_lat_min=-90.0 geospatial_lat_resolution=1.0 geospatial_lat_units=degrees_north geospatial_lon_max=359.0 geospatial_lon_min=0.0 geospatial_lon_resolution=1.0 geospatial_lon_units=degrees_east history=FMRC Best Dataset infoUrl=https://www.ncei.noaa.gov/thredds/catalog/model-nmme_ccsm4_psl_day_r01_agg/catalog.html?dataset=model-nmme_ccsm4_psl_day_r01_agg/NMME_CCSM4_Pressure_at_Sea_Level_Daily_Aggregation_R01_best.ncd institution=Univ. of Miami - Rosenstiel School of Marine & Atmosphereric Science institution_id=UM-RSMAS location=Proto fmrc:NMME_CCSM4_Pressure_at_Sea_Level_Daily_Aggregation_R01 model_id=CCSM4_0_a02 modeling_realm=atmos Northernmost_Northing=90.0 project_id=National Multi-Model Ensembles(NMME) project realization=01 References=Ben P. Kirtman, Dughong Min. (2009) Multimodel Ensemble ENSO Prediction with CCSM and CFS. Monthly Weather Review 137:9, 2908-2930 sourceUrl=https://www.ncei.noaa.gov/thredds/dodsC/model-nmme_ccsm4_psl_day_r01_agg/NMME_CCSM4_Pressure_at_Sea_Level_Daily_Aggregation_R01_best.ncd Southernmost_Northing=-90.0 startmonth=02 startyear=2025 time_coverage_end=2026-02-28T12:00:00Z time_coverage_start=2018-01-01T12:00:00Z Westernmost_Easting=0.0
f
Data_Sheet_1_An Expanded Polyproline Domain Maintains Mutant Huntingtin...
frontiersin.figshare.com
docx
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Lucia Pigazzini; Mandy Lawrenz; Anca Margineanu; Gabriele S. Kaminski Schierle; Janine Kirstein (2023). Data_Sheet_1_An Expanded Polyproline Domain Maintains Mutant Huntingtin Soluble in vivo and During Aging.docx [Dataset]. http://doi.org/10.3389/fnmol.2021.721749.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnmol.2021.721749.s001
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Maria Lucia Pigazzini; Mandy Lawrenz; Anca Margineanu; Gabriele S. Kaminski Schierle; Janine Kirstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Huntington’s disease is a dominantly inherited neurodegenerative disorder caused by the expansion of a CAG repeat, encoding for the amino acid glutamine (Q), present in the first exon of the protein huntingtin. Over the threshold of Q39 HTT exon 1 (HTTEx1) tends to misfold and aggregate into large intracellular structures, but whether these end-stage aggregates or their on-pathway intermediates are responsible for cytotoxicity is still debated. HTTEx1 can be separated into three domains: an N-terminal 17 amino acid region, the polyglutamine (polyQ) expansion and a C-terminal proline rich domain (PRD). Alongside the expanded polyQ, these flanking domains influence the aggregation propensity of HTTEx1: with the N17 initiating and promoting aggregation, and the PRD modulating it. In this study we focus on the first 11 amino acids of the PRD, a stretch of pure prolines, which are an evolutionary recent addition to the expanding polyQ region. We hypothesize that this proline region is expanding alongside the polyQ to counteract its ability to misfold and cause toxicity, and that expanding this proline region would be overall beneficial. We generated HTTEx1 mutants lacking both flanking domains singularly, missing the first 11 prolines of the PRD, or with this stretch of prolines expanded. We then followed their aggregation landscape in vitro with a battery of biochemical assays, and in vivo in novel models of C. elegans expressing the HTTEx1 mutants pan-neuronally. Employing fluorescence lifetime imaging we could observe the aggregation propensity of all HTTEx1 mutants during aging and correlate this with toxicity via various phenotypic assays. We found that the presence of an expanded proline stretch is beneficial in maintaining HTTEx1 soluble over time, regardless of polyQ length. However, the expanded prolines were only advantageous in promoting the survival and fitness of an organism carrying a pathogenic stretch of Q48 but were extremely deleterious to the nematode expressing a physiological stretch of Q23. Our results reveal the unique importance of the prolines which have and still are evolving alongside expanding glutamines to promote the function of HTTEx1 and avoid pathology.
f
Data from: S1 Data -
plos.figshare.com
txt
Updated Dec 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahadee Al Mobin; Md. Kamrujjaman (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0295803.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295803.s001
Dataset updated
Dec 14, 2023
Dataset provided by
PLOS ONE
Authors
Mahadee Al Mobin; Md. Kamrujjaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Bat-aggregated time series workflow
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Lee (2024). Bat-aggregated time series workflow [Dataset]. http://doi.org/10.5061/dryad.w0vt4b8zf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w0vt4b8zf
Dataset updated
Oct 15, 2024
Dataset provided by
University of California, Santa Barbara
Authors
Brian Lee
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
This dataset and code provides radar-based detections of Brazilian free-tailed bats (Tadarida brasiliensis) across select regions of California and Texas, compiled using weather radar data from the NEXRAD (NEXtgeneration weather RADar) system. NEXRAD radars, operated by the US National Weather Service, continuously monitor the airspace, detecting various airborne organisms including birds, insects, and bats. The dataset was generated using the ‘BATS’ Python toolkit (program included), which automates the retrieval, processing, and classification of radar data. It employs a pre-trained machine learning model specifically designed to detect radar echoes associated with Brazilian free-tailed bats. The dataset includes the results from machine learning models trained and tested on radar data, which achieved an AUC of 0.963, demonstrating high accuracy in identifying bat activity. The dataset also includes pre-trained neural network and random forest models for reproducibility. This dataset provides valuable spatiotemporal information on bat presence at a large landscape scale and across extended timeframes. By distilling radar data into efficient summaries of bat occurrence, the dataset enables researchers to explore patterns in bat activity and their potential ecosystem services, such as insect consumption, in agricultural regions.

Methods Data Description This dataset provides detailed radar-based detections of Brazilian free-tailed bats (Tadarida brasiliensis) across select regions of California and Texas. The data were compiled from the NEXRAD (NEXt-generation weather RADar) system, which operates S-band Doppler weather radars across the United States. NEXRAD radars detect various airborne targets such as birds, insects, and bats. The dataset is processed using the 'BATS' Python toolkit, which automates the retrieval and classification of radar data. Using radar data sourced from the Amazon Web Services (AWS) repository, the BATS toolkit classifies radar echoes based on a machine learning model trained to identify Brazilian free-tailed bats. The dataset contains bat presence information at a pixel resolution of 70 meters, derived from radar data over multiple time periods in 2018 and 2019. This data will be useful for researchers exploring bat ecology, insectivorous bat ecosystem services, and landscape-level bat monitoring. The dataset includes:

Radar data processed to detect bat presence in California (2018) and Texas (2019) Classified radar pixels indicating bat presence or absence Machine learning-derived bat occurrence probabilities (thresholded for binary classification) Geotiff files that aggregate radar data over six-month periods

Methods Data Collection The dataset was generated using NEXRAD radar data, sourced from AWS. The BATS Python toolkit facilitated the collection and processing of radar data files, automating the pipeline from raw radar retrieval to bat detection. Radar data was selected based on specific regions, timeframes, and weather conditions associated with confirmed Brazilian free-tailed bat emergence events. The radar data collected spans 11 weather-free days in California (2018) and 7 days in Texas (2019). Reference data on bat emergence was gathered from field observations provided by local bat monitoring organizations. Data Processing Once downloaded, the raw radar data (Level II “.gz” files) was processed using the Py-ART library, which is designed for radar data manipulation. Py-ART converted the radar data from its native polar coordinates into a uniform Cartesian grid, with a resampled pixel resolution of 70 meters to facilitate accurate bat detection. The processed radar data was then classified using a machine learning pipeline. The BATS toolkit includes scripts for classification, in which radar echoes were evaluated by pre-trained machine learning models. The dataset was classified using three machine learning models: random forest (RF), support vector machines (SVM), and artificial neural networks (ANN). The ANN model, selected for its superior performance (AUC of 0.963), was used to classify each radar pixel as either containing or not containing Brazilian free-tailed bats. The model outputs a binary classification based on a 90% probability threshold to ensure accurate detection while minimizing false positives. Evaluation and Quality Control To ensure the accuracy of the model and its classifications, the dataset was evaluated using standard binary classification metrics: precision, recall, AUC (Area Under the ROC Curve), and precision-recall curves. Hyperparameter tuning and spatial cross-validation were performed to account for spatial autocorrelation in the radar data and to improve the generalization of the machine learning models. Training data for the model was primarily sourced from California, while independent testing was conducted using radar data from Texas. The dataset also includes labeled data representing noise sources (such as birds, vehicles, and weather phenomena) to reduce false positives during classification. By processing large volumes of radar data and applying machine learning algorithms, the BATS toolkit condensed terabytes of raw radar data into concise geotiff maps of bat presence, enabling efficient analysis of bat populations across landscapes.
g
Aggregation of 1 micron latex spheres suspended in sterile EPS isolated from...
data.gulfresearchinitiative.org
data.griidc.org
Updated Jun 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GRIIDC (2018). Aggregation of 1 micron latex spheres suspended in sterile EPS isolated from bacterial isolates and consortia on a micro-scale crude oil droplet [Dataset]. http://doi.org/10.7266/N7BV7F6V
Explore at:
Unique identifier
https://doi.org/10.7266/N7BV7F6V
Dataset updated
Jun 25, 2018
Dataset provided by
GRIIDC
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The effect of extracellular polymeric substances (EPS) extracted from Sagittula stellata and natural Gulf of Mexico bacterial consortia on aggregation of 1 micron latex particles on a crude oil drop surface is observed. Crude oil droplets between 100-200 microns are pinned in a microchannel while time lapse microscopy observes particle aggregation with time. Aggregation rates correlate positively with increasing protein-carbohydrate ratios in the varying EPS compositions studied.
E
CARESAT (aggregated per 1-degree cell)
erddap.eurobis.org
emodnet.ec.europa.eu
+2more
Updated Jan 17, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luschi (2018). CARESAT (aggregated per 1-degree cell) [Dataset]. https://erddap.eurobis.org/erddap/info/zd_1686_1deg/index.html
Explore at:
Dataset updated
Jan 17, 2018
Dataset authored and provided by
Luschi
Time period covered
Jan 1, 2014 - Aug 1, 2020
Area covered

Variables measured
sex, time, Notes, aphia_id, latitude, TimeOfDay, lifestage, longitude, DayCollected, BasisOfRecord, and 5 more
Description
CARESAT is a project funded by the Tuscany Region (Italy) aiming to use satellite telemetry to increase the limited information currently available on the movements of loggerhead turtles frequenting Tuscany waters and the Pelagos Marine Sanctuary. To this aim, turtles found in Tuscan waters and rehabilitated in Tuscan rescue centers will be equipped with satellite transmitters, to reconstruct the movements made by tracked individuals, to identify the areas of the Sanctuary that are mainly frequented, and to reveal hitherto unknown aspects of their ecology and behavior. AccConID=24 AccConstrDescription=This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms AccConstrDisplay=This dataset is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. AccConstrEN=Attribution-NonCommercial (CC BY-NC) AccessConstraint=Attribution-NonCommercial (CC BY-NC) AccessConstraints=None Acronym=None added_date=2023-04-27 11:03:36.983000 BrackishFlag=None CDate=2023-02-08 cdm_data_type=Other CheckedFlag=0 Citation=Luschi P. 2021. CARESAT. Data originated from Satellite Tracking and Analysis Tool (STAT; http://www.seaturtle.org/tracking/index.shtml?project_id=1050). Comments=Only data aggregated per 1-degree cell are available through OBIS. The non-aggregated data are available through the OBIS-SEAMAP Portal. ContactEmail=pluschi@biologia.unipi.it Conventions=COARDS, CF-1.6, ACDD-1.3 CurrencyDate=None DasID=8204 DasOrigin=Sensor platform DasType=Data DasTypeID=1 DateLastModified={'date': '2025-02-13 01:37:29.538097', 'timezone_type': 1, 'timezone': '+01:00'} DescrCompFlag=0 DescrTransFlag=0 Easternmost_Easting=14.5 EmbargoDate=None EngAbstract=CARESAT is a project funded by the Tuscany Region (Italy) aiming to use satellite telemetry to increase the limited information currently available on the movements of loggerhead turtles frequenting Tuscany waters and the Pelagos Marine Sanctuary. To this aim, turtles found in Tuscan waters and rehabilitated in Tuscan rescue centers will be equipped with satellite transmitters, to reconstruct the movements made by tracked individuals, to identify the areas of the Sanctuary that are mainly frequented, and to reveal hitherto unknown aspects of their ecology and behavior. EngDescr=Original provider: Islameta Group, University of Pisa

Dataset credits: Data provider Islameta Group, Dept. of Biology - University of Pisa Originating data center Satellite Tracking and Analysis Tool (STAT) Project partner Parco Regionale della Maremma (Maremma Regional Park) Project sponsor or sponsor description Osservatorio Toscano Cetacei e Tartarughe (Tuscan Observatory Cetaceans and Turtles)

Abstract: CARESAT is a project funded by the Tuscany Region (Italy) aiming to use satellite telemetry to increase the limited information currently available on the movements of loggerhead turtles frequenting Tuscany waters and the Pelagos Marine Sanctuary. To this aim, turtles found in Tuscan waters and rehabilitated in Tuscan rescue centers will be equipped with satellite transmitters, to reconstruct the movements made by tracked individuals, to identify the areas of the Sanctuary that are mainly frequented, and to reveal hitherto unknown aspects of their ecology and behavior.

Supplemental information: Visit STAT's project page for additional information.

This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. FreshFlag=None GBIF_UUID=03fdaaf6-227f-4731-9ee4-72ad5b28d80d geospatial_lat_max=47.5 geospatial_lat_min=37.5 geospatial_lat_units=degrees_north geospatial_lon_max=14.5 geospatial_lon_min=-16.5 geospatial_lon_units=degrees_east infoUrl=None InputNotes=None institution=None License=https://creativecommons.org/licenses/by-nc/4.0 Lineage=None MarineFlag=1 modified_sync=2023-03-31 00:00:00 Northernmost_Northing=47.5 OrigAbstract=None OrigDescr=None OrigDescrLang=None OrigDescrLangNL=None OrigLangCode=None OrigLangCodeExtended=None OrigLangID=None OrigTitle=None OrigTitleLang=None OrigTitleLangCode=None OrigTitleLangID=None OrigTitleLangNL=None Progress=None PublicFlag=1 ReleaseDate=Jul 11 2021 10:00PM ReleaseDate0=2021-07-11 RevisionDate=None SizeReference=None sourceUrl=(local files) Southernmost_Northing=37.5 standard_name_vocabulary=CF Standard Name Table v70 StandardTitle=CARESAT (aggregated per 1-degree cell) StatusID=1 subsetVariables=ScientificName,BasisOfRecord,YearCollected,MonthCollected,DayCollected,sex,lifestage,aphia_id TerrestrialFlag=None time_coverage_end=2020-08-01T01:00:00Z time_coverage_start=2014-01-01T01:00:00Z UDate=2023-11-20 VersionDate=Jul 11 2021 10:00PM VersionDay=12 VersionMonth=7 VersionName=None VersionYear=2021 VlizCoreFlag=1 Westernmost_Easting=-16.5

Facebook

Twitter

Click to copy link

Link copied

Cite

kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil (2018). NCOM Region 10 Aggregation/Best Time Series [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/5cdc21deb99c4b25bb51704b576e14c6/html

NCOM Region 10 Aggregation/Best Time Series

Explore at:

opendapAvailable download formats

Dataset updated

Nov 21, 2018

Authors

kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil; kelly.r.wood@navy.mil; jeffery.rayburn@navy.mil

Area covered

Description

Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.

Clear search

Close search

Google apps

Main menu

NCOM Region 10 Aggregation/Best Time Series

Envestnet | Yodlee's De-Identified Online Purchase Data | Row/Aggregate...

HYCOM Surface Aggregation/Best Time Series

Data from: Usable observations over Europe: Evaluation of compositing...

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly...

Data from: A consistent data model for different data granularity in control...

NCOM SFC8 Hindcast Aggregation/Best Time Series

Data from: Using partial aggregation in Spatial Capture Recapture

Monthly aggregated GLASS FAPAR V6 (250 m): 50th percentile monthly...

Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female

Script for aggregating Norfolk, VA environmental data to daily time scale

India CS: Aggregate Deposits of Residents: Time

On the Stability of the Excess Sensitivity of Aggregate Consumption Growth...

WWF Italy (aggregated per 1-degree cell)

NMME CCSM4 Pressure at Sea Level Daily Aggregation R01 PSL By time,...

Data_Sheet_1_An Expanded Polyproline Domain Maintains Mutant Huntingtin...

Data from: S1 Data -

Bat-aggregated time series workflow

Aggregation of 1 micron latex spheres suspended in sterile EPS isolated from...

CARESAT (aggregated per 1-degree cell)

NCOM Region 10 Aggregation/Best Time SeriesSee More Versions

NCOM Region 10 Aggregation/Best Time Series