Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Protein aggregation is the phenomenon which occurs when misfolded or unfolded protein physically binds together and can cause the development of various amyloidosis diseases. The goal of this study was to construct surrogate models for predicting protein aggregation using data-driven methods with two types of databases. This study suggests which approaches is more effective to predict protein aggregation depending on types of descriptors and database.
In the fall of 2013, the Detroit Blight Removal Task Force commissioned Data Driven Detroit, the Michigan Nonprofit Association, and LOVELAND Technologies to conduct a survey of every parcel in the City of Detroit. The goal of the survey was to collect data on property condition and vacancy. The effort, called Motor City Mapping, leveraged relationships with the Rock Ventures family of companies and the Detroit Employment Solutions Corporation to assemble a dedicated team of over 200 resident surveyors, drivers, and quality control associates. Data collection occurred from December 4, 2013 until February 16, 2014, and the initiative resulted in survey information for over 370,000 parcels of land in the city of Detroit, identifying condition, occupancy, and use. The data were then extensively reviewed by the Motor City Mapping quality control team, a process that concluded on September 30, 2014. This file contains the official certified results from the Winter 2013/2014 survey, aggregated to 2010 Census Tracts for easy mapping and analysis. The topics covered in the dataset include totals and calculated percentages for parcels in the categories of illegal dumping, fire damage, structural condition, existence of a structure or accessory structure, and improvements on lots without structures.Metadata associated with this file includes field description metadata and a narrative summary documenting the process of creating the dataset.
http://dcat-ap.ch/vocabulary/licenses/terms_byhttp://dcat-ap.ch/vocabulary/licenses/terms_by
This dataset shows the aggregated results of the National Council elections of 22 October 2023.Please note that the officially valid final results are published in the cantonal journal of the Canton of Basel-Stadt.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
CERF Organization Publications and Aggregated Budget in the Past Three Years
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1millions; DD = directly dependent crops; ID = indirectly dependent crops; x = year; na = not applicable; df = 1 all effects.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
The Western and Central Fisheries Commission (WCPFC) have compiled a public domain version of aggregated catch and effort data using operational, aggregate and annual catch estimates data provided by Commission Members (CCMs) and Cooperating Non-members (CNMs). The data provided herein have been prepared for dissemination in accordance with the current “Rules and Procedures for the Protection, Access to, and Dissemination of Data Compiled by the Commission” or (“RAP”).
Paragraph 9 of the Rules and Procedures indicates that "Catch and Effort data in the public domain shall be made up of observations from a minimum of three vessels". However, the majority of aggregate data provided to WPCFC do not indicate how many vessels were active in each cell of data which would allow data to be directly filtered according to this rule. Instead, the individual cells where "effort" is less than or equal to the maximum value estimated to represent the activities of two vessels have been removed from the public domain data (the cells are retained with their time/area information, but all catch and effort information in these have been set to zero). Statistics showing how much data have been removed according to this RAP requirement are provided in the documentation for the longline and purse seine public domain data.
All public domain data have been aggregated by year/month and 5°x5° grid. Annex 2 of the RAP indicates that public domain aggregated catch/effort data can be made available at a higher resolution (e.g. data with a breakdown by vessel nation, and aggregated by 1°x1° grids for surface fisheries); however, if the public domain data were provided at these higher levels of resolution implementation of the RAP "three-vessel rule" with the current aggregate data set would result in too many cells being removed.
However, please note that the data that have been removed from the public domain dataset, available on this webpage, are still potentially accessible via other provisions of the RAP (refer to section 4.6 and para 34).
Each public domain zip file contains two files: (1) a CSV file containing the data; (2) a PDF file containing the field names/formats and the coverage with respect to the data file.
These data files were last updated on the 27th July 2020.
Survey results from University of Stirling. This dataset is not publicly accessible because: University of Stirling developed the survey and keeps the results. It can be accessed through the following means: Contact University of Stirling. Format: University of Stirling developed the survey and keeps the results. Group on Earth Observation AquaWatch distributed the survey and aggregated results. School of Wine & Spirits Business, Burgundy School of Business - Université Bourgogne Franche-Comté conducted statistical analysis. This dataset is associated with the following publication: Agnoli, L., E. Urquhart, N. Georgantzis, B. Schaeffer, R. Simmons, B. Hoque, M.B. Neely, C. Neil, J. Oliver, and A. Tyler. Perspectives on user engagement of satellite Earth observation for water quality management. Technological Forecasting and Social Change. ELSEVIER, AMSTERDAM, HOLLAND, 189: 122357, (2023).
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The characterization of shoreline strips was carried out on 400 km of agricultural watercourses for the territory of 5 MRCs in Montérégie (Beauharnois-Salaberry, Haut-Richelieu, Jardins-de-Napierville, Jardins-de-Napierville, Marguerite-d'Youville, Vallée-du-Richelieu). The results obtained by photo-interpretation, based on the width of the sections of shoreline strips calculated from the high-water line and the embankment, were aggregated to produce global results by municipality. The project was carried out as part of the Regional Program for the Acquisition of Data on Wetlands and Water Environments (PRADMHH) and was funded by the Regions and Rurality Fund (FRR) of the Montérégie regional department of the MAMH. Criteria used to characterize the conformity of shoreline strips. Riparian compliance (Criteria used according to the width of the shoreline) Non-compliant (The non-compliant shoreline has a total width of less than 3 meters) Almost compliant (The nearly-compliant shoreline has a total width of 3 meters or more, but a width of less than one meter on the embankment) Compliant (The compliant shoreline has a total width of three meters or more and a width of a minimum of one meter on the embankment) Exceptional (The exceptional shoreline has a total width of 5 meters or more and a width of 3 meters (or more from the embankment)**This third party metadata element was translated using an automated translation tool (Amazon Translate).**
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The results of DCN-based aggregated neighbor-counting method on 9 single-domain proteins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset on aggregate extractions in the European seas was created in 2014 by AZTI for the European Marine Observation and Data Network (EMODnet). It is the result of the aggregation and harmonization of datasets provided by several sources from all across Europe. It is available for viewing and download on EMODnet web portal (Human Activities, https://emodnet.ec.europa.eu/en/human-activities). The dataset contains points representing aggregate extraction sites, by year (although some data are indicated by a period of years), in the following countries: Belgium, Denmark, Finland, France, Germany, Ireland, Italy, Lithuania, Poland, Portugal, Spain, Sweden, The Netherlands and United Kingdom. Where available, each point has the following attributes: Id (Identifier), Position Info (e.g.: Estimated, Original, Polygon centroid of dredging area, Estimated polygon centroid of dredging area), Country, Sea basin, Sea, Name of the extraction area, Area of activity (km2), Year (the year when the extraction took place; when a time period is available, the first year of the period is indicated), Permitted Amount (m3) (permitted amount of material to be extracted, in m3), Permitted Amount (t) (permitted amount of material to be extracted, in tonnes), Requested Amount (m3) (requested amount of material to be extracted, in m3), Requested Amount (t) (requested amount of material to be extracted, in tonnes), Extracted Amount (m3) (extracted amount of material, in m3), Extracted Amount (t) (extracted amount of material, in tonnes), Extraction Type (Marine sediment extraction), Purpose (e.g.: Commercial, Others, N/A), End Use (e.g.: Beach nourishment, Construction, Reclamation fill, N/A), Material type (e.g.: sand, gravel, maerl), Notes, Link to Web Sources. In 2018, a feature on areas for aggregate extractions was included. It contains polygons representing areas of seabed licensed for exploration or extraction of aggregates, in the following countries: Belgium, Denmark, Estonia, Finland, France, Germany, Italy, Lithuania, Poland, Portugal, Russia, Spain, Sweden, The Netherlands and United Kingdom. Where available, each polygon has the following attributes: Id (Identifier), Area code, Area name, Country, Sea basin, Sea, Starting year (the year when the license starts), End year (the year when the license ends), Site Type (exploration area, extraction area, extraction area (in use)), License status (Active, not active, expired, unknown), Material type (e.g.: sand, gravel, maerl), Notes, Distance to coast (in metres), Link to Web Sources. In the 2024 update, extraction data until 2023 and new areas have been included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets and results associated with "Sample, estimate, aggregate: A recipe for causal discovery foundation models"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population results from census dis aggregated by gender and age
https://www.icpsr.umich.edu/web/ICPSR/studies/1119/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/1119/terms
Despite extensive research into the nature and determinants of party identification, links between individual-level partisan persistence and the degree of permanence in aggregate-level partisanship have largely been ignored. The failure to link the two levels of analysis leaves a gap in our collective understanding of the dynamics of aggregate partisanship. To remedy this, a set of ideal types are identified in this collection that capture the essential arguments made about individual-level party identification. The behavioral assumptions for each ideal type are then combined with existing results on statistical aggregation to deduce the specific temporal pattern that each ideal type implies for aggregate levels of partisanship. Using new diagnostic tests and a highly general time series model, the investigators found that aggregate measures of partisanship from 1953 through 1992 are fractionally integrated. The evidence that the effects of a shock to aggregate partisanship last for years -- not months or decades -- challenges previous work by party systems theorists (e.g., Burnham, 1970) and students of "macropartisanship" (e.g., MacKuen, Erikson, and Stimson, 1989). The arguments and empirical evidence of the degree of persistence in macro-level partisanship provides a conceptually richer and empirically more precise basis for existing theories -- such as those of issue evolution (Carmines and Stimson, 1989) or endogenous preferences (Gerber and Jackson, 1993) -- in which partisanship plays a central role.
https://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal
Main structural results by type of indicator and educational level (aggregate). National.
Results of the Fulton County Citizen Survey aggregated to combine all positive responses and all negative responses for each question to support Socrata goal pages.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The DIAMAS project investigates Institutional Publishing Service Providers (IPSP) in the broadest sense, with a special focus on those publishing initiatives that do not charge fees to authors or readers. To collect information on Institutional Publishing in the ERA, a survey was conducted among IPSPs between March-May 2024. This dataset contains aggregated data from the 685 valid responses to the DIAMAS survey on Institutional Publishing.
The dataset supplements D2.3 Final IPSP landscape Report Institutional Publishing in the ERA: results from the DIAMAS survey.
The data
Basic aggregate tabular data
Full individual survey responses are not being shared to prevent the easy identification of respondents (in line with conditions set out in the survey questionnaire). This dataset contains full tables with aggregate data for all questions from the survey, with the exception of free-text responses, from all 685 survey respondents. This includes, per question, overall totals and percentages for the answers given as well the breakdown by both IPSP-types: institutional publishers (IPs) and service providers (SPs). Tables at country level have not been shared, as cell values often turned out to be too low to prevent potential identification of respondents. The data is available in csv and docx formats, with csv files grouped and packaged into ZIP files. Metadata describing data type, question type, as well as question response rate, is available in csv format. The R code used to generate the aggregate tables is made available as well.
Files included in this dataset
survey_questions_data_description.csv - metadata describing data type, question type, as well as question response rate per survey question.
tables_raw_all.zip - raw tables (csv format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option. Zip file contains 180 csv files.
tables_raw_IP.zip - as tables_raw_all.zip, for responses from institutional publishers (IP) only. Zip file contains 180 csv files.
tables_raw_SP.zip - as tables_raw_all.zip, for responses from service providers (SP) only. Zip file contains 170 csv files.
tables_formatted_all.docx - formatted tables (docx format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option.
tables_formatted_IP.docx - as tables_formatted_all.docx, for responses from institutional publishers (IP) only.
tables_formatted_SP.docx - as tables_formatted_all.docx, for responses from service providers (SP) only.
DIAMAS_Tables_single.R - R script used to generate raw tables with aggregated data for all single response questions
DIAMAS_Tables_multiple.R - R script used to generate raw tables with aggregated data for all multiple response questions
DIAMAS_Tables_layout.R - R script used to generate document with formatted tables from raw tables with aggregated data
DIAMAS Survey on Instititutional Publishing - data availability statement (pdf)
All data are made available under a CC0 license.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this study, we merge results of two recent directions in efficiency analysis research-aggregation and bootstrap-applied, as an example, to one of the most popular point estimators of individual efficiency: the data envelopment analysis (DEA) estimator. A natural context of the methodology developed here is a study of efficiency of a particular economic system (e.g., an industry) as a whole, or a comparison of efficiencies of distinct groups within such a system (e.g., regulated vs. non-regulated firms or private vs. public firms). Our methodology is justified by the (neoclassical) economic theory and is supported by carefully adapted statistical methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Research datasets about top signals for covid 19 (coronavirus) for study into Google Trends (GT) and with SEO metrics
Website
The study is currently published on https://covidgilance.org website (in french)
Datasets description
covid signals -> |selection| -> 4 dataset -> |serp.py| -> 4 serp datasets -> |aggregate_serp.pl| -> 4 aggregated dataset of serp -> |prepare datasets| -> 4 ranked top seo dataset
Original lists of signals (mainly covid symptoms) - dataset
Description: contain the original relevant list of signals for covid19 (here list of queries where you can see, in GT, a relevant signal during the covid 19 period of time)
Name: covid_signal_list.tsv
List of content:
- id: unique id for the topic
- topic-fr: name of the topic in French
- topic-en: name of the topic in English
- topic-id: GT topic id
- keyword fr: one or several keywords in French for GT
- keyword en: one or several keywords in English for GT
- fr-topic-url-12M: link to 12-months French query topic in GT in France
- en-topic-url-12M: link to 12-months English query topic in GT in US
- fr-url-12M: link to 12-months French queries in GT in France
- en-url-12M: link to 12-months English queries topic in GT in US
- fr-topic-url-5M: link to 5-months French query topic in GT in France
- en-topic-url-5M: link to 5-months English query topic in GT in US
- fr-url-5M: link to 5-months French queries in GT in France
- en-url-5M: link to 5-months English queries topic in GT in US
Tool to get SERP of covid signals - tool
Description: query google with a list of covid signals and obtain a list of serps in csv (tsv in fact) file format
Name: serper.py
python serper.py
SERP files - datasets
Description Serp results for 4 datesets of queries Names: simple version of covid signals from google.ch in French: serp_signals_20_ch_fr.csv
simple version of covid signals from google.com in English: serp_signals_20_en.csv
amplified version of covid signals from google.ch in French: serp_signals_covid_20_ch_fr.csv
amplified version of covid signals from google.com in English: serp_signals_covid_20_en.csv
amplified version means that for each query we create two queries one with the keywords "covid" and one with "coronavirus"
Tool to aggregate SERP results - tool
Description: load csv serp data and aggregate the data to create a new csv file where each line is a website and each column is a query. Name: aggregate_serp.pl
`perl aggregate_serp.pl> aggregated_signals_20_en.csv
datasets of top website from the SERP results - dataset
Description a aggregated version of the SERP where each line is a website and each column a query
Names:
aggregated_signals_20_ch_fr.csv
aggregated_signals_20_en.csv
aggregated_signals_covid_20_ch_fr.csv
aggregated_signals_covid_20_en.csv
List of content:
- domain: domain name of the website
- signal 1: Position of the query 1 (signal 1) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal ...: Position of the query (signal) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal n: Position of the query n (signal n) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- total: average position (total of all position /divided by the number of queries)
- missing: Total number of missing results in the SERP for this website
datasets ranked top seo - dataset
Description a ranked (by weighted average position) version of the aggregated version of the SERP where each line is a website and each column a query. TOP 20 have more information about the type and HONcode validity (from the date of collect: September 2020)
Names:
ranked_signals_20_ch_fr.csv
ranked_signals_20_en.csv
ranked_signals_covid_20_ch_fr.csv
ranked_signals_covid_20_en.csv
List of content:
- domain: domain name of the website
- signal 1: Position of the query 1 (signal 1) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal ...: Position of the query (signal) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal n: Position of the query n (signal n) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- avg position: average position (total of all position /divided by the number of queries)
- nb missing: Total number of missing results in the SERP for this website
- % presence: % of presence
- weighted avg postion: combination of avg position and % of presence for final ranking
- honcode: status of the Honcode certificate for this website (none/valid/expired)
- type: type of the website (health, gov, edu or media)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data to report https://doi.org/10.1002/ange.202314667:
J-aggregates are highly desired dye aggregates but so far there has been no general concept how to accomplish the required slip-stacked packing arrangement for dipolar merocyanine (MC) dyes whose aggregation commonly affords one-dimensional aggregates composed of antiparallel, co-facially stacked MCs with H-type coupling. Herein we describe a strategy for MC J-aggregates based on our results for an amphiphilic MC dye bearing alkyl and oligo(ethylene glycol) side chains. In an aqueous solvent mixture, we observe the formation of two supramolecular polymorphs for this MC dye, a metastable off-pathway nanoparticle showing H-type coupling and a thermodynamically favored nanosheet showing J-type coupling. Detailed studies concerning the self-assembly mechanism by UV-Vis spectroscopy and the packing structure by atomic force microscopy and wide-angle X-ray scattering show how the packing arrangement of such amphiphilic MC dyes can afford slip-stacked two-dimensional nanosheets whose macrodipole is compensated by the formation of a bilayer structure. As an additional feature we demonstrate how the size of the nanosheets can be controlled by seeded living supramolecular polymerization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Protein aggregation is the phenomenon which occurs when misfolded or unfolded protein physically binds together and can cause the development of various amyloidosis diseases. The goal of this study was to construct surrogate models for predicting protein aggregation using data-driven methods with two types of databases. This study suggests which approaches is more effective to predict protein aggregation depending on types of descriptors and database.