Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.
Envestnet®| Yodlee®'s Online Purchase Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.
Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.
We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.
Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?
Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.
Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking
Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)
Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence
Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis
Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Landsat and Sentinel-2 data archives provide ever-increasing amounts of satellite data. However, the availability of usable observations greatly varies spatially and temporally. Pixel-based compositing that generates temporally equidistant cloud-free synthetic images can mitigate temporal variability, by constructing uninterrupted time series using different compositing windows. Here, we evaluated the feasibility of using compositing windows ranging from five days to one year for 1984-2021 Landsat and 2015-2021 Sentinel 2 time series to derive uninterrupted time series across Europe. We considered separate and joint use of both data archives and analyzed the spatio-temporal availability of composites during each calendar year and pixel-specific growing season across a variety of time windows and hypothesizing data interpolation. Our results demonstrated opportunities and limitations in the available data records to support medium- and long-term analyses requiring uninterrupted time series of composites with sub-annual temporal resolution. Spatial disparities across different compositing windows provide guidance on the feasibility of workflows relying on different data densities and on the challenges in wall-to-wall analyses. The feasibility of consistent time series based on composites with sub-monthly aggregation periods was mostly limited to the combined Landsat and Sentinel-2 archives after 2015, yet in some geographies requires interpolation of up to 50% of data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CESNET-TimeSeries24: The dataset for network traffic forecasting and anomaly detection
The dataset called CESNET-TimeSeries24 was collected by long-term monitoring of selected statistical metrics for 40 weeks for each IP address on the ISP network CESNET3 (Czech Education and Science Network). The dataset encompasses network traffic from more than 275,000 active IP addresses, assigned to a wide variety of devices, including office computers, NATs, servers, WiFi routers, honeypots, and video-game consoles found in dormitories. Moreover, the dataset is also rich in network anomaly types since it contains all types of anomalies, ensuring a comprehensive evaluation of anomaly detection methods.Last but not least, the CESNET-TimeSeries24 dataset provides traffic time series on institutional and IP subnet levels to cover all possible anomaly detection or forecasting scopes. Overall, the time series dataset was created from the 66 billion IP flows that contain 4 trillion packets that carry approximately 3.7 petabytes of data. The CESNET-TimeSeries24 dataset is a complex real-world dataset that will finally bring insights into the evaluation of forecasting models in real-world environments.
Please cite the usage of our dataset as:
Josef Koumar, Karel Hynek, Tomáš Čejka, Pavel Šiška, "CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting", arXiv e-prints (2024): https://doi.org/10.48550/arXiv.2409.18874 @misc{koumar2024cesnettimeseries24timeseriesdataset, title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting}, author={Josef Koumar and Karel Hynek and Tomáš Čejka and Pavel Šiška}, year={2024}, eprint={2409.18874}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2409.18874}, }
Time series
We create evenly spaced time series for each IP address by aggregating IP flow records into time series datapoints. The created datapoints represent the behavior of IP addresses within a defined time window of 10 minutes. The vector of time-series metrics v_{ip, i} describes the IP address ip in the i-th time window. Thus, IP flows for vector v_{ip, i} are captured in time windows starting at t_i and ending at t_{i+1}. The time series are built from these datapoints.
Datapoints created by the aggregation of IP flows contain the following time-series metrics:
Simple volumetric metrics: the number of IP flows, the number of packets, and the transmitted data size (i.e. number of bytes)
Unique volumetric metrics: the number of unique destination IP addresses, the number of unique destination Autonomous System Numbers (ASNs), and the number of unique destination transport layer ports. The aggregation of \textit{Unique volumetric metrics} is memory intensive since all unique values must be stored in an array. We used a server with 41 GB of RAM, which was enough for 10-minute aggregation on the ISP network.
Ratios metrics: the ratio of UDP/TCP packets, the ratio of UDP/TCP transmitted data size, the direction ratio of packets, and the direction ratio of transmitted data size
Average metrics: the average flow duration, and the average Time To Live (TTL)
Multiple time aggregation: The original datapoints in the dataset are aggregated by 10 minutes of network traffic. The size of the aggregation interval influences anomaly detection procedures, mainly the training speed of the detection model. However, the 10-minute intervals can be too short for longitudinal anomaly detection methods. Therefore, we added two more aggregation intervals to the datasets--1 hour and 1 day.
Time series of institutions: We identify 283 institutions inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution's data.
Time series of institutional subnets: We identify 548 institution subnets inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution subnet's data.
Data Records
The file hierarchy is described below:
cesnet-timeseries24/
|- institution_subnets/
| |- agg_10_minutes/<id_institution>.csv
| |- agg_1_hour/<id_institution>.csv
| |- agg_1_day/<id_institution>.csv
| |- identifiers.csv
|- institutions/
| |- agg_10_minutes/<id_institution_subnet>.csv
| |- agg_1_hour/<id_institution_subnet>.csv
| |- agg_1_day/<id_institution_subnet>.csv
| |- identifiers.csv
|- ip_addresses_full/
| |- agg_10_minutes/<id_ip_folder>/<id_ip>.csv
| |- agg_1_hour/<id_ip_folder>/<id_ip>.csv
| |- agg_1_day/<id_ip_folder>/<id_ip>.csv
| |- identifiers.csv
|- ip_addresses_sample/
| |- agg_10_minutes/<id_ip>.csv
| |- agg_1_hour/<id_ip>.csv
| |- agg_1_day/<id_ip>.csv
| |- identifiers.csv
|- times/
| |- times_10_minutes.csv
| |- times_1_hour.csv
| |- times_1_day.csv
|- ids_relationship.csv |- weekends_and_holidays.csv
The following list describes time series data fields in CSV files:
id_time: Unique identifier for each aggregation interval within the time series, used to segment the dataset into specific time periods for analysis.
n_flows: Total number of flows observed in the aggregation interval, indicating the volume of distinct sessions or connections for the IP address.
n_packets: Total number of packets transmitted during the aggregation interval, reflecting the packet-level traffic volume for the IP address.
n_bytes: Total number of bytes transmitted during the aggregation interval, representing the data volume for the IP address.
n_dest_ip: Number of unique destination IP addresses contacted by the IP address during the aggregation interval, showing the diversity of endpoints reached.
n_dest_asn: Number of unique destination Autonomous System Numbers (ASNs) contacted by the IP address during the aggregation interval, indicating the diversity of networks reached.
n_dest_port: Number of unique destination transport layer ports contacted by the IP address during the aggregation interval, representing the variety of services accessed.
tcp_udp_ratio_packets: Ratio of packets sent using TCP versus UDP by the IP address during the aggregation interval, providing insight into the transport protocol usage pattern. This metric belongs to the interval <0, 1> where 1 is when all packets are sent over TCP, and 0 is when all packets are sent over UDP.
tcp_udp_ratio_bytes: Ratio of bytes sent using TCP versus UDP by the IP address during the aggregation interval, highlighting the data volume distribution between protocols. This metric belongs to the interval <0, 1> with same rule as tcp_udp_ratio_packets.
dir_ratio_packets: Ratio of packet directions (inbound versus outbound) for the IP address during the aggregation interval, indicating the balance of traffic flow directions. This metric belongs to the interval <0, 1>, where 1 is when all packets are sent in the outgoing direction from the monitored IP address, and 0 is when all packets are sent in the incoming direction to the monitored IP address.
dir_ratio_bytes: Ratio of byte directions (inbound versus outbound) for the IP address during the aggregation interval, showing the data volume distribution in traffic flows. This metric belongs to the interval <0, 1> with the same rule as dir_ratio_packets.
avg_duration: Average duration of IP flows for the IP address during the aggregation interval, measuring the typical session length.
avg_ttl: Average Time To Live (TTL) of IP flows for the IP address during the aggregation interval, providing insight into the lifespan of packets.
Moreover, the time series created by re-aggregation contains following time series metrics instead of n_dest_ip, n_dest_asn, and n_dest_port:
sum_n_dest_ip: Sum of numbers of unique destination IP addresses.
avg_n_dest_ip: The average number of unique destination IP addresses.
std_n_dest_ip: Standard deviation of numbers of unique destination IP addresses.
sum_n_dest_asn: Sum of numbers of unique destination ASNs.
avg_n_dest_asn: The average number of unique destination ASNs.
std_n_dest_asn: Standard deviation of numbers of unique destination ASNs)
sum_n_dest_port: Sum of numbers of unique destination transport layer ports.
avg_n_dest_port: The average number of unique destination transport layer ports.
std_n_dest_port: Standard deviation of numbers of unique destination transport layer ports.
Moreover, files identifiers.csv in each dataset type contain IDs of time series that are present in the dataset. Furthermore, the ids_relationship.csv file contains a relationship between IP addresses, Institutions, and institution subnets. The weekends_and_holidays.csv contains information about the non-working days in the Czech Republic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
After a long-running show was canceled, control charts are used to identify if and when viewing drops. The finest granularity daily viewing has high autocorrelation and control charts use residuals from a seasonal ARIMA model. For coarse granularity data (weekly and monthly viewing) an approximate AR model is derived to be consistent with the finest granularity model. With the proposed approach, a longer memory model is used in the granular data control charts that reduces the number of false alarms from control charts constructed treating granular data as a different measurement.
Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Subdatasets:
General Description
The monthly aggregated Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) dataset is derived from 250m 8d GLASS V6 FAPAR. The data set is derived from Moderate Resolution Imaging Spectroradiometer (MODIS) reflectance and LAI data using several other FAPAR products (MODIS Collection 6, GLASS FAPAR V5, and PROBA-V1 FAPAR) to generate a bidirectional long-short-term memory (Bi-LSTM) model to estimate FAPAR. The dataset time spans from March 2000 to December 2021 and provides data that covers the entire globe. The dataset can be used in many applications like land degradation modeling, land productivity mapping, and land potential mapping. The dataset includes:
Derived from monthly time-series. This dataset provides linear trend model for the p95 variable: (1) slope beta mean (p95.beta_m), p-value for beta (p95.beta_pv), intercept alpha mean (p95.alpha_m), p-value for alpha (p95.alpha_pv), and coefficient of determination R2 (p95.r2_m).
Monthly aggregation with three standard statistics: (1) 5th percentile (p05), median (p50), and 95th percentile (p95). For each month, we aggregate all composites within that month plus one composite each before and after, ending up with 5 to 6 composites for a single month depending on the number of images within that month.
Data Details
Support
If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue: https://github.com/Open-Earth-Monitor/Global_FAPAR_250m/issues
Reference
Hackländer, J., Parente, L., Ho, Y.-F., Hengl, T., Simoes, R., Consoli, D., Şahin, M., Tian, X., Herold, M., Jung, M., Duveiller, G., Weynants, M., Wheeler, I., (2023?) "Land potential assessment and trend-analysis using 2000–2021 FAPAR monthly time-series at 250 m spatial resolution", submitted to PeerJ, preprint available at: https://doi.org/10.21203/rs.3.rs-3415685/v1
Name convention
To ensure consistency and ease of use across and within the projects, we follow the standard Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describes important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. The fields are:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female data was reported at 225,972.411 Hour th in Jan 2025. This records an increase from the previous number of 225,859.486 Hour th for Dec 2024. Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female data is updated monthly, averaging 123,113.874 Hour th from Jul 1978 (Median) to Jan 2025, with 559 observations. The data reached an all-time high of 225,972.411 Hour th in Jan 2025 and a record low of 46,197.368 Hour th in Jul 1978. Australia Aggregate Monthly Hours Worked: Trend: Part Time: Female data remains active status in CEIC and is reported by Australian Bureau of Statistics. The data is categorized under Global Database’s Australia – Table AU.G052: Aggregate Monthly Hours Worked.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Script and accompanying ipython notebook written in Python 2.7 for aggregating sub-daily environmental data (rainfall, tide, wind, groundwater) to a daily timescale. The input data are from Norfolk, Virginia. Several different methods of aggregation are used including averages and maximums. The processed/aggregated data are combined with street flood report data to be used in data-driven, predictive modeling. The script in this resource was used in the analysis described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
India CS: Aggregate Deposits of Residents: Time data was reported at 103,241,410.000 INR mn in Sep 2018. This records an increase from the previous number of 102,539,530.000 INR mn for Aug 2018. India CS: Aggregate Deposits of Residents: Time data is updated monthly, averaging 30,494,680.000 INR mn from Mar 1999 (Median) to Sep 2018, with 235 observations. The data reached an all-time high of 103,241,410.000 INR mn in Sep 2018 and a record low of 5,454,360.000 INR mn in Mar 1999. India CS: Aggregate Deposits of Residents: Time data remains active status in CEIC and is reported by Reserve Bank of India. The data is categorized under Global Database’s India – Table IN.KAC003: Commercial Bank Survey.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper investigates whether there is time variation in the excess sensitivity of aggregate consumption growth to anticipated aggregate disposable income growth using quarterly US data over the period 1953-2014. Our empirical framework contains the possibility of stickiness in aggregate consumption growth and takes into account measurement error and time aggregation. Our empirical specification is cast into a Bayesian state-space model and estimated using Markov chain Monte Carlo (MCMC) methods. We use a Bayesian model selection approach to deal with the non-regular test for the null hypothesis of no time variation in the excess sensitivity parameter. Anticipated disposable income growth is calculated by incorporating an instrumental variables estimation approach into our MCMC algorithm. Our results suggest that the excess sensitivity parameter in the USA is stable at around 0.23 over the entire sample period.
Original provider: Paolo Casale Dataset credits: Data provider WWF Italy's Sea Turtle Network Originating data center Satellite Tracking and Analysis Tool (STAT) Supplemental information: Visit STAT's project page for additional information. This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. AccConID=24 AccConstrDescription=This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms AccConstrDisplay=This dataset is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. AccConstrEN=Attribution-NonCommercial (CC BY-NC) AccessConstraint=Attribution-NonCommercial (CC BY-NC) AccessConstraints=This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. Acronym=None added_date=2024-06-04 11:58:39.543000 BrackishFlag=0 CDate=2023-04-12 cdm_data_type=Other CheckedFlag=0 Citation=Casale P. 2021. WWF Italy. Data originated from Satellite Tracking and Analysis Tool (STAT; http://www.seaturtle.org/tracking/index.shtml?project_id=184). Comments=None ContactEmail=paolo.casale1@gmail.com Conventions=COARDS, CF-1.6, ACDD-1.3 CurrencyDate=None DasID=8288 DasOrigin=None DasType=None DasTypeID=None DateLastModified={'date': '2025-02-18 01:34:01.301036', 'timezone_type': 1, 'timezone': '+01:00'} DescrCompFlag=0 DescrTransFlag=0 Easternmost_Easting=49.5 EmbargoDate=None EngAbstract=Original provider: Paolo Casale Dataset credits: Data provider WWF Italy's Sea Turtle Network Originating data center Satellite Tracking and Analysis Tool (STAT) Supplemental information: Visit STAT's project page for additional information. This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. EngDescr=None FreshFlag=0 GBIF_UUID=5e413639-a91c-41ba-aa33-8583c479a3fa geospatial_lat_max=47.5 geospatial_lat_min=25.5 geospatial_lat_units=degrees_north geospatial_lon_max=49.5 geospatial_lon_min=-53.5 geospatial_lon_units=degrees_east infoUrl=None InputNotes=None institution=WWF License=https://creativecommons.org/licenses/by-nc/4.0 Lineage=None MarineFlag=1 modified_sync=2024-05-21 00:00:00 Northernmost_Northing=47.5 OrigAbstract=None OrigDescr=None OrigDescrLang=None OrigDescrLangNL=None OrigLangCode=None OrigLangCodeExtended=None OrigLangID=None OrigTitle=None OrigTitleLang=None OrigTitleLangCode=None OrigTitleLangID=None OrigTitleLangNL=None Progress=None PublicFlag=1 ReleaseDate=Apr 24 2021 12:00AM ReleaseDate0=2021-04-24 RevisionDate=None SizeReference=None sourceUrl=(local files) Southernmost_Northing=25.5 standard_name_vocabulary=CF Standard Name Table v70 StandardTitle=WWF Italy (aggregated per 1-degree cell) StatusID=1 subsetVariables=ScientificName,BasisOfRecord,YearCollected,MonthCollected,DayCollected,sex,lifestage,aphia_id TerrestrialFlag=0 UDate=2023-04-20 VersionDate=Apr 24 2021 12:00AM VersionDay=None VersionMonth=None VersionName=None VersionYear=None VlizCoreFlag=1 Westernmost_Easting=-53.5
NMME CCSM4 Pressure at Sea Level Daily Aggregation R01 PSL Dimensioned By time, latitude, longitude. _CoordSysBuilder=ucar.nc2.dataset.conv.CF1Convention cdm_data_type=Grid contact=Dughong Min (dmin@rsmas.miami.edu) and Ben Kirtman (bkirtman@rsmas.miami.edu) Conventions=CF-1.4 Easternmost_Easting=359.0 endmonth=01 endyear=2026 experiment=Febuary 2025 Forecast experiment_id=Mon Mar 10 12:06:20 PM EDT 2025 frequency=day Generator=NCL v.6.0 geospatial_lat_max=90.0 geospatial_lat_min=-90.0 geospatial_lat_resolution=1.0 geospatial_lat_units=degrees_north geospatial_lon_max=359.0 geospatial_lon_min=0.0 geospatial_lon_resolution=1.0 geospatial_lon_units=degrees_east history=FMRC Best Dataset infoUrl=https://www.ncei.noaa.gov/thredds/catalog/model-nmme_ccsm4_psl_day_r01_agg/catalog.html?dataset=model-nmme_ccsm4_psl_day_r01_agg/NMME_CCSM4_Pressure_at_Sea_Level_Daily_Aggregation_R01_best.ncd institution=Univ. of Miami - Rosenstiel School of Marine & Atmosphereric Science institution_id=UM-RSMAS location=Proto fmrc:NMME_CCSM4_Pressure_at_Sea_Level_Daily_Aggregation_R01 model_id=CCSM4_0_a02 modeling_realm=atmos Northernmost_Northing=90.0 project_id=National Multi-Model Ensembles(NMME) project realization=01 References=Ben P. Kirtman, Dughong Min. (2009) Multimodel Ensemble ENSO Prediction with CCSM and CFS. Monthly Weather Review 137:9, 2908-2930 sourceUrl=https://www.ncei.noaa.gov/thredds/dodsC/model-nmme_ccsm4_psl_day_r01_agg/NMME_CCSM4_Pressure_at_Sea_Level_Daily_Aggregation_R01_best.ncd Southernmost_Northing=-90.0 startmonth=02 startyear=2025 time_coverage_end=2026-02-28T12:00:00Z time_coverage_start=2018-01-01T12:00:00Z Westernmost_Easting=0.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Huntington’s disease is a dominantly inherited neurodegenerative disorder caused by the expansion of a CAG repeat, encoding for the amino acid glutamine (Q), present in the first exon of the protein huntingtin. Over the threshold of Q39 HTT exon 1 (HTTEx1) tends to misfold and aggregate into large intracellular structures, but whether these end-stage aggregates or their on-pathway intermediates are responsible for cytotoxicity is still debated. HTTEx1 can be separated into three domains: an N-terminal 17 amino acid region, the polyglutamine (polyQ) expansion and a C-terminal proline rich domain (PRD). Alongside the expanded polyQ, these flanking domains influence the aggregation propensity of HTTEx1: with the N17 initiating and promoting aggregation, and the PRD modulating it. In this study we focus on the first 11 amino acids of the PRD, a stretch of pure prolines, which are an evolutionary recent addition to the expanding polyQ region. We hypothesize that this proline region is expanding alongside the polyQ to counteract its ability to misfold and cause toxicity, and that expanding this proline region would be overall beneficial. We generated HTTEx1 mutants lacking both flanking domains singularly, missing the first 11 prolines of the PRD, or with this stretch of prolines expanded. We then followed their aggregation landscape in vitro with a battery of biochemical assays, and in vivo in novel models of C. elegans expressing the HTTEx1 mutants pan-neuronally. Employing fluorescence lifetime imaging we could observe the aggregation propensity of all HTTEx1 mutants during aging and correlate this with toxicity via various phenotypic assays. We found that the presence of an expanded proline stretch is beneficial in maintaining HTTEx1 soluble over time, regardless of polyQ length. However, the expanded prolines were only advantageous in promoting the survival and fitness of an organism carrying a pathogenic stretch of Q48 but were extremely deleterious to the nematode expressing a physiological stretch of Q23. Our results reveal the unique importance of the prolines which have and still are evolving alongside expanding glutamines to promote the function of HTTEx1 and avoid pathology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This dataset and code provides radar-based detections of Brazilian free-tailed bats (Tadarida brasiliensis) across select regions of California and Texas, compiled using weather radar data from the NEXRAD (NEXtgeneration weather RADar) system. NEXRAD radars, operated by the US National Weather Service, continuously monitor the airspace, detecting various airborne organisms including birds, insects, and bats. The dataset was generated using the ‘BATS’ Python toolkit (program included), which automates the retrieval, processing, and classification of radar data. It employs a pre-trained machine learning model specifically designed to detect radar echoes associated with Brazilian free-tailed bats. The dataset includes the results from machine learning models trained and tested on radar data, which achieved an AUC of 0.963, demonstrating high accuracy in identifying bat activity. The dataset also includes pre-trained neural network and random forest models for reproducibility. This dataset provides valuable spatiotemporal information on bat presence at a large landscape scale and across extended timeframes. By distilling radar data into efficient summaries of bat occurrence, the dataset enables researchers to explore patterns in bat activity and their potential ecosystem services, such as insect consumption, in agricultural regions.
Methods Data Description This dataset provides detailed radar-based detections of Brazilian free-tailed bats (Tadarida brasiliensis) across select regions of California and Texas. The data were compiled from the NEXRAD (NEXt-generation weather RADar) system, which operates S-band Doppler weather radars across the United States. NEXRAD radars detect various airborne targets such as birds, insects, and bats. The dataset is processed using the 'BATS' Python toolkit, which automates the retrieval and classification of radar data. Using radar data sourced from the Amazon Web Services (AWS) repository, the BATS toolkit classifies radar echoes based on a machine learning model trained to identify Brazilian free-tailed bats. The dataset contains bat presence information at a pixel resolution of 70 meters, derived from radar data over multiple time periods in 2018 and 2019. This data will be useful for researchers exploring bat ecology, insectivorous bat ecosystem services, and landscape-level bat monitoring. The dataset includes:
Radar data processed to detect bat presence in California (2018) and Texas (2019) Classified radar pixels indicating bat presence or absence Machine learning-derived bat occurrence probabilities (thresholded for binary classification) Geotiff files that aggregate radar data over six-month periods
Methods Data Collection The dataset was generated using NEXRAD radar data, sourced from AWS. The BATS Python toolkit facilitated the collection and processing of radar data files, automating the pipeline from raw radar retrieval to bat detection. Radar data was selected based on specific regions, timeframes, and weather conditions associated with confirmed Brazilian free-tailed bat emergence events. The radar data collected spans 11 weather-free days in California (2018) and 7 days in Texas (2019). Reference data on bat emergence was gathered from field observations provided by local bat monitoring organizations. Data Processing Once downloaded, the raw radar data (Level II “.gz” files) was processed using the Py-ART library, which is designed for radar data manipulation. Py-ART converted the radar data from its native polar coordinates into a uniform Cartesian grid, with a resampled pixel resolution of 70 meters to facilitate accurate bat detection. The processed radar data was then classified using a machine learning pipeline. The BATS toolkit includes scripts for classification, in which radar echoes were evaluated by pre-trained machine learning models. The dataset was classified using three machine learning models: random forest (RF), support vector machines (SVM), and artificial neural networks (ANN). The ANN model, selected for its superior performance (AUC of 0.963), was used to classify each radar pixel as either containing or not containing Brazilian free-tailed bats. The model outputs a binary classification based on a 90% probability threshold to ensure accurate detection while minimizing false positives. Evaluation and Quality Control To ensure the accuracy of the model and its classifications, the dataset was evaluated using standard binary classification metrics: precision, recall, AUC (Area Under the ROC Curve), and precision-recall curves. Hyperparameter tuning and spatial cross-validation were performed to account for spatial autocorrelation in the radar data and to improve the generalization of the machine learning models. Training data for the model was primarily sourced from California, while independent testing was conducted using radar data from Texas. The dataset also includes labeled data representing noise sources (such as birds, vehicles, and weather phenomena) to reduce false positives during classification. By processing large volumes of radar data and applying machine learning algorithms, the BATS toolkit condensed terabytes of raw radar data into concise geotiff maps of bat presence, enabling efficient analysis of bat populations across landscapes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The effect of extracellular polymeric substances (EPS) extracted from Sagittula stellata and natural Gulf of Mexico bacterial consortia on aggregation of 1 micron latex particles on a crude oil drop surface is observed. Crude oil droplets between 100-200 microns are pinned in a microchannel while time lapse microscopy observes particle aggregation with time. Aggregation rates correlate positively with increasing protein-carbohydrate ratios in the varying EPS compositions studied.
CARESAT is a project funded by the Tuscany Region (Italy) aiming to use satellite telemetry to increase the limited information currently available on the movements of loggerhead turtles frequenting Tuscany waters and the Pelagos Marine Sanctuary. To this aim, turtles found in Tuscan waters and rehabilitated in Tuscan rescue centers will be equipped with satellite transmitters, to reconstruct the movements made by tracked individuals, to identify the areas of the Sanctuary that are mainly frequented, and to reveal hitherto unknown aspects of their ecology and behavior. AccConID=24 AccConstrDescription=This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms AccConstrDisplay=This dataset is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. AccConstrEN=Attribution-NonCommercial (CC BY-NC) AccessConstraint=Attribution-NonCommercial (CC BY-NC) AccessConstraints=None Acronym=None added_date=2023-04-27 11:03:36.983000 BrackishFlag=None CDate=2023-02-08 cdm_data_type=Other CheckedFlag=0 Citation=Luschi P. 2021. CARESAT. Data originated from Satellite Tracking and Analysis Tool (STAT; http://www.seaturtle.org/tracking/index.shtml?project_id=1050). Comments=Only data aggregated per 1-degree cell are available through OBIS. The non-aggregated data are available through the OBIS-SEAMAP Portal. ContactEmail=pluschi@biologia.unipi.it Conventions=COARDS, CF-1.6, ACDD-1.3 CurrencyDate=None DasID=8204 DasOrigin=Sensor platform DasType=Data DasTypeID=1 DateLastModified={'date': '2025-02-13 01:37:29.538097', 'timezone_type': 1, 'timezone': '+01:00'} DescrCompFlag=0 DescrTransFlag=0 Easternmost_Easting=14.5 EmbargoDate=None EngAbstract=CARESAT is a project funded by the Tuscany Region (Italy) aiming to use satellite telemetry to increase the limited information currently available on the movements of loggerhead turtles frequenting Tuscany waters and the Pelagos Marine Sanctuary. To this aim, turtles found in Tuscan waters and rehabilitated in Tuscan rescue centers will be equipped with satellite transmitters, to reconstruct the movements made by tracked individuals, to identify the areas of the Sanctuary that are mainly frequented, and to reveal hitherto unknown aspects of their ecology and behavior. EngDescr=Original provider: Islameta Group, University of Pisa
Dataset credits: Data provider Islameta Group, Dept. of Biology - University of Pisa Originating data center Satellite Tracking and Analysis Tool (STAT) Project partner Parco Regionale della Maremma (Maremma Regional Park) Project sponsor or sponsor description Osservatorio Toscano Cetacei e Tartarughe (Tuscan Observatory Cetaceans and Turtles)
Abstract: CARESAT is a project funded by the Tuscany Region (Italy) aiming to use satellite telemetry to increase the limited information currently available on the movements of loggerhead turtles frequenting Tuscany waters and the Pelagos Marine Sanctuary. To this aim, turtles found in Tuscan waters and rehabilitated in Tuscan rescue centers will be equipped with satellite transmitters, to reconstruct the movements made by tracked individuals, to identify the areas of the Sanctuary that are mainly frequented, and to reveal hitherto unknown aspects of their ecology and behavior.
Supplemental information: Visit STAT's project page for additional information.
This dataset is a summarized representation of the telemetry locations aggregated per species per 1-degree cell. FreshFlag=None GBIF_UUID=03fdaaf6-227f-4731-9ee4-72ad5b28d80d geospatial_lat_max=47.5 geospatial_lat_min=37.5 geospatial_lat_units=degrees_north geospatial_lon_max=14.5 geospatial_lon_min=-16.5 geospatial_lon_units=degrees_east infoUrl=None InputNotes=None institution=None License=https://creativecommons.org/licenses/by-nc/4.0 Lineage=None MarineFlag=1 modified_sync=2023-03-31 00:00:00 Northernmost_Northing=47.5 OrigAbstract=None OrigDescr=None OrigDescrLang=None OrigDescrLangNL=None OrigLangCode=None OrigLangCodeExtended=None OrigLangID=None OrigTitle=None OrigTitleLang=None OrigTitleLangCode=None OrigTitleLangID=None OrigTitleLangNL=None Progress=None PublicFlag=1 ReleaseDate=Jul 11 2021 10:00PM ReleaseDate0=2021-07-11 RevisionDate=None SizeReference=None sourceUrl=(local files) Southernmost_Northing=37.5 standard_name_vocabulary=CF Standard Name Table v70 StandardTitle=CARESAT (aggregated per 1-degree cell) StatusID=1 subsetVariables=ScientificName,BasisOfRecord,YearCollected,MonthCollected,DayCollected,sex,lifestage,aphia_id TerrestrialFlag=None time_coverage_end=2020-08-01T01:00:00Z time_coverage_start=2014-01-01T01:00:00Z UDate=2023-11-20 VersionDate=Jul 11 2021 10:00PM VersionDay=12 VersionMonth=7 VersionName=None VersionYear=2021 VlizCoreFlag=1 Westernmost_Easting=-16.5
Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.Best time series, taking the data from the most recent run available.