Facebook
TwitterDatabase of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a merged and unified one from seven individual datasets, making it the longest records ever and wide coverage in the US for flood studies. All individual databases and a unified database are provided to accommodate different user needs. It is anticipated that this database can support a variety of flood-related research, such as a validation resource for hydrologic or hydraulic simulations, climatic studies concerning spatiotemporal patterns of floods given this long-term and U.S.-wide coverage, and flood susceptibility analysis for vulnerable geophysical locations. Description of filenames: 1. cyberFlood_1104.csv – web-based crowdsourced flood database, developed at the University of Oklahoma (Wan et al., 2014). 203 flood events from 1998 to 2008 are retrieved with the latest version. Data accessed on 11/04/2020. Data attributes: ID, Year, Month, Day, Duration, fatality, Severity, Cause, Lat, Long, Country Code, Continent Code 2. DFO.xlsx – the Dartmouth Flood Observatory flood database. It is a tabular form of global flood database, collected from news, government agencies, stream gauges, and remote sensing instruments from 1985 to the present. Data accessed on 10/27/2020. Data attributes: ID, GlodeNumber, Country, OtherCountry, long, lat, Area, Began, Ended, Validation, Dead, Displaced, MainCause, Severity 3. emdat_public_2020_11_01_query_uid-MSWGVQ.xlsx – Emergency Events Database (EM-DAT). This flood report is managed by the Centre for Research on the Epidemiology of Disasters in Belgium, which contains all types of global natural disasters from 1900 to the present. Data accessed on 11/01/2020. Data attributes: Dis No, Year, Seq, Disaster Group, Disaster Subgroup, Disaster Type, Disaster Subtype, Disaster Subsubtype, Event Nane, Entity Criteria, Country, ISO, Region, Continent, Location, Origin, Associated Disaster, Associated Disaster2, OFDA Response, Appeal, Declaration, Aid Contribution, Disaster Magnitude, Latitude, Longitude, Local Time, River Basin, Start Year, Start Month, Start Day, End Year, End Month, End Day, Total Death, No. Injured, No. Affected, No. Homeless, Total Affected, Reconstruction, Insured Damages, Total Damages, CPI 4. extracted_events_NOAA.csv – The national weather service storm reports. The NOAA NWS team collects weather-related natural hazards from 1950 to the present. Data accessed on 10/27/2020. Data attributes: BEGIN_YEARMONTH, BEGIN_DAY, BEGIN_TIME, END_YEARMONTH, END_DAY, END_TIME, EPISODE_ID, EVENT_ID, STATE, STATE_FIPS, YEAR, MONTH_NAME, EVENT_TYPE, CZ_TYPE, CZ_FIPS, CZ_NAME, WFO, BEGIN_DATETIME, CZ_TIMEZONE, END_DATE_TIME, INJURIES_DIRECT, INJURIES_INDIRECT, DEATHS_DIRECT, DEATHS_INDIRECT, DAMAGE_PROPERTY, DAMAGE_CROPS, SOURCE, MAGNITUDE, MAGNITUDE_TYPE, FLOOD CAUSE, CATEGORY, TOR_F_SCALE< TOR_LENGTH, TOR_WIDTH, TOR_OTHER_WFO, TOR_OTHER_CZ_STATE, TOR_OTHER_CZ_FIPS, BEGIN_RANGE, BEGIN_AZIMUTH, BEGIN_LOCATION, END_RANGE, END_AZIMUTH, END_LOCATION, BEGIN_LAT, BEGIN_LON, END_LAT, END_LON, EPISODE_NARRATIVE, EVENT_NARRATIVE, DATA_SOURCE 5. FEDB_1118.csv – The University of Connecticut Flood Events Database. Floods retrieved from 6,301 stream gauges in the U.S. after flow separation from 2002 to 2013 (Shen et al., 2017). Data accessed on 11/18/2020. Data attributes: STCD, StartTimeP, EndTimeP, StartTimeF, EndTimeF, Perc, Peak, RunoffCoef, IBF, Vp, Vb, Vt, Pmean, ETr, ELs, VarTr, VarLs, EQ, Q2, CovTrLs, Category, Geometry 6. GFM_events.csv – Global Flood Monitoring dataset. It is a crowdsourcing flood database derived from Twitter tweets over the globe since 2014. Data accessed on 11/9/2020. Data attributes: event_id, location_ID, location_ID_url, name, type, country_location_ID, country_ISO3, start, end, time of detection 7. mPing_1030.csv – meteorological Phenomena Identification Near the Ground (mPing). The mPing app is a crowdsourcing, weather-reporting software jointly developed by NOAA National Severe Storms Laboratory (NSSL) and the University of Oklahoma (Elmore et al., 2014). Data accessed on 10/30/2020. Data attributes: id, obtime, category, description, description_id, lon, lat 8. USFD_v1.1.csv – A merged United States Flood Database from 1900 to the present (UPDATED) Data attributes: DATE_BEGIN, DATE_END, DURATION, LON, LAT, COUNTRY, STATE, AREA, FATALITY, DAMAGE, SEVERITY, SOURCE, CAUSE, SOURCE_DB, SOURCE_ID, DESCRIPTION, SLOPE, DEM, LULC, DISTANCE_RIVER, CONT_AREA, DEPTH, YEAR. Details of attributes: DATE_BEGIN: begin datetime of an event. yyyymmddHHMMSS DATE_END: end datetime of an event. yyyymmddHHMMSS DURATION: duration of an event in hours LON: longitude in degrees LAT: latitude in degrees COUNTRY: United States of America STATE: US state name AREA: affected areas in km^2 FATALITY: number of fatalities DAMAGE: economic damages in US dollars SEVERITY: event severity, (1/1.5/2) according to DFO. SOURCE: flood information source. CAUSE: flood cause. SOURCE_DB: source database from item 1-7. SOURCE_ID: original ID in the source database. DESCRIPTION: event description SLOPE: calculated slope based on SRTM DEM 90m DEM: Digital Elevation Model LULC: Land Use Land Cover DISTANCE_RIVER: distance to major river network in km, CONT_AREA: contributing area (km^2), from MERIT Hydro DEPTH: 500-yr flood depth YEAR: year of the event. 9. attribution_table.xlsx – description of each database, and URLs are provided to retrieve these databases. The script to merge all sources and figure plots can be found in https://github.com/chrimerss/USFD. If you intend to use this dataset, please cite our description paper: Li, Z., Chen, M., Gao, S., Gourley, J. J., Yang, T., Shen, X., Kolar, R., and Hong, Y.: A multi-source 120-year US flood database with a unified common format and public access, Earth Syst. Sci. Data, 13, 3755–3766, https://doi.org/10.5194/essd-13-3755-2021, 2021.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Irish Drought Impacts Database (IDID) presents information from drought related newspaper articles for the island of Ireland, covering the period 1733 – 2019, collected through systematic searching of the Irish Newspaper Archive. These articles were identified, assessed and categorized using a modified version of the classification scheme employed by the European Drought Impact Inventory (EDII) (Stahl et al. 2012). The IDID provides information on the documented temporal and geographical extent of drought events, their socio-economic and political contexts, their consequences and mitigation strategies employed. Temporal information includes the newspaper publication date, timing and duration of drought periods and timing of impacts. Spatial details are provided on three different levels; in addition to Nomenclature of Territorial Units for Statistics Level 2 (Nuts2) regions, county location and, if available, a more localized place name (e.g., town or city name) is also recorded. The database allows the analysis of long-term patterns in drought incidence and impacts as well as offering insights into the impacts of individual drought events over nearly three centuries of Ireland’s history. Spatially specific data provide an opportunity for further exploration of the differential vulnerability of various geographical locations, which may vary depending on biophysical conditions, economic, political and societal context. Textual excerpts offer further insight into the human experience and perception of drought and its impacts. The IDID therefore enables a better understanding of drought, vulnerability, and offers a new open access tool for multi-disciplinary investigation of drought on the island of Ireland.
Facebook
TwitterSpanish Fake News Dataset
This dataset contains a structured and annotated collection of false news items in Spanish (Castilian), gathered and processed for academic research on misinformation.
Dataset Scope
The dataset represents most of the recorded false news messages and their variations up to 01.02.2021.
Content Description
The dataset includes samples of false information in various formats:
News articles and headlines
Tweets and Facebook/Instagram/Telegram posts
YouTube video captions
WhatsApp text and voice message transcripts
Transcribed video/audio fragments with false claims
Fake government documents
Captions from photos and memes
Text extracted from images using OCR
Only Spanish (Castilian) texts were used, excluding regional variants (e.g., Catalan, Basque, Galician) for consistency.
Sources
The data was collected from the following verified fact-checking initiatives:
Maldito Bulo
Newtral
AFP Factual
Fact-checkers from these organizations provide detailed articles identifying and explaining falsehoods, often including:
General context of the event
Quotes or links to false claims
Analysis and explanation of why the claims are false
Verified information or corrections
Collection Method
The dataset was built using both manual extraction (e.g., identifying and quoting false statements) and automated parsing:
MyNews service: an archive of Spanish mass media
Custom scripts: for parsing and extracting structured data
OCR tools: for extracting text from images (e.g., memes and screenshots)
Fields Description
Column Name
Description
Topic
The thematic category of the news item (e.g., Politics, Health, COVID-19, Crime). Normalized and translated to English.
Link source
URL to the original news piece, fact-check report, or source of the claim. Invalid links were removed.
Media
The platform or outlet where the false claim appeared (e.g., Facebook, YouTube, WhatsApp). Normalized for consistent spelling and language.
Date
Publication or verification date of the news item, in YYYY-MM-DD format.
Author
(Optional) Author of the news or platform source, if available. May be empty.
Headlines
Title or summary of the news item or article containing the false information.
Fake statement
Quoted false claim or misinformation as cited in the verification article.
⚠️ Notes
The dataset was preprocessed to remove duplicates, invalid links, and non-textual clutter.
Field values were normalized to support multilingual and cross-platform analysis.
Only Castilian Spanish was retained for consistency and clarity.
📚 License & Use
This dataset is intended for non-commercial academic and research purposes. Please cite the original fact-checking organizations and this dataset if used in publications or analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This folder contains data behind the stories: * The Media Really Has Neglected Puerto Rico * The Media Really Started Paying Attention To Puerto Rico When Trump Did
Data about Online News was collected using the Media Cloud dashboard, an open source suite of tools for analyzing a database of online news.
mediacloud_hurricanes.csv contains the number of sentences per day that mention Hurricanes Harvey, Irma, Jose, and Maria in online news.mediacloud_states.csv (Updated through 10/10/2017) contains the number of sentences per day that mention Puerto Rico, Texas, and Florida in online news.mediacloud_trump.csv (Updated through 10/10/2017) contains the number of headlines that mention Puerto Rico, Texas, and Florida, as well as headlines that mention those three locations along with 'President' or 'Trump'.mediacloud_top_online_news.csv contains a list of sources included in Media Cloud's "U.S. Top Online News" collection.TV News Data was collected from the Internet TV News Archive using the Television Explorer tool.
tv_hurricanes.csv - contains the percent of sentences per day in TV News that mention Hurricanes Harvey, Irma, Jose, and Maria.tv_hurricanes_by_network.csv - contains the percent of sentences per day in TV News per network that mention Hurricanes Harvey, Irma, Jose, and Maria.tv_states.csv (Updated through 10/10/2017) - contains the percent of sentences per day in TV News that mention Puerto Rico, Texas, and Florida.Google search data was collected using the Google Trends dashboard.
google_trends.csv - Contains data on google trend searches for Hurricanes Harvey, Irma, Jose, and Maria.This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!
This dataset is maintained using GitHub's API and Kaggle's API.
This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The database was built using the G1 portal (g1.globo.com) as a news source and 72 news items related to work accidents were recorded. The search was conducted using the keywords “accident” and “industry” as a filter. News items between 2018 and 2022 were treated as a priority in data mining. For certain records, the “Process” category could not be filled in because this variable was not available in the news item. In this case, 56 records of this category were obtained. Each recorded news item is equivalent to a single accident/event and was characterized by 12 main attributes:
The severity levels of the events were classified into 5 levels in the context of the chemical industries according to Rathnayaka et al. (2011) [1]. Increasingly:
The industrial activities (related to the “Process” attribute) that appeared in the database can be described by 7 categories:
industrial_accidents_in_brazil_from_news_EN.csv: The database in English.industrial_accidents_in_brazil_from_news_BR.csv: The database in Portuguese. CNAE_dict_EN.pkl: A Python dictionary containing the decoded values for the CNAE code in English version.CNAE_dict_BR.pkl: A Python dictionary containing the decoded values for the CNAE code in Portuguese version.For import
CNAE_dict.pklfile, execute the code:python import pickle with open( 'CNAE_dict.pkl', 'rb' ) as f: code_dict = pickle.load( f )
[1] RATHNAYAKA, S.; KHAN, F.; AMYOTTE, P. SHIPP methodology: Predictive accident modeling approach. Part II. Validation with case study. Process Safety and Environmental Protection, v. 89, n. 2, p. 75–88, mar. 2011.
Facebook
TwitterWarning: as of June 2020, this dataset is no longer updated and has been replaced. Please see https://www.donneesquebec.ca/recherche/fr/dataset/evenements-de-securite-civile for data on civil security events since June 2020. This database brings together in a structured way information related to past claims that have been systematically grouped and centralized by the Ministry of Public Security (MSP). The consequences and evolution of the events are documented and they have been categorized according to their level of impact on the safety of citizens, goods and services to the population based on criteria defined in the Canadian profile of the Common Alert Protocol. It is updated continuously by the MSP Operations Department (DO). This database will allow analyses to be carried out at regional and local levels and can be used by municipalities in the implementation of their emergency measures plans. The event history archives come from event reports and status reports that were produced by the Government Operations Center (COG) and by the regional directorates of the MSP. Among other things, we find: 1- Observations entered directly into the Geoportal by the civil security advisers of the regional directorates; 2- A compilation of information recorded in COG event reports and DO status reports distributed to MSP partners since 1996; 3- A compilation of the information contained in the files of the regional directorates. This can be information on paper, event reports or field visits, paper or digital maps, etc. The information in this database is consistent with the Canadian Common Alert Protocol Profile (PC-PAC). The PC-PAC is a set of rules and controlled values that support the translation and composition of a message to make it possible to send it by different means and from different sources. The severity level is an attribute defined in the PC-PAC. It is used to characterize the severity level of the event based on the harm to the lives of people or damage to property. This severity level is defined by the following characteristics: Extreme: an extraordinary threat to life or property; Significant: significant threat to life or property; Moderate: possible threat to life or property; Minor: a low or non-existent threat to life or property; Unknown: unknown severity, used for example during tests and exercises. The emergency level is determined based on the reactive measures that need to be taken in response to the current situation. It is defined by the following characteristics: Immediate: reactive action must be taken immediately; Expected: reactive action should be taken soon (within the next hour); Future: reactive action should be taken in the near future; Past: a reactive measure is no longer necessary; Unknown: Unknown emergency, to be used during tests and exercises. The state relates to the context of the event, real or simulated. It is defined by the following characteristics: Current: information on a real event or situation; Exercise: fictional or real information produced as part of a civil security exercise; Test: technical tests only; to be ignored by all. Confidence is defined by the following characteristics: Observed: would have happened or is currently taking place; Probable: probability of the event happening > 50%; Possible: probability of the event happening < 50%; Unlikely: probability of the event happening around 0%; Unknown: certainty unknown. When an event date was not known, the year 1900-01-01 was recorded. DESCRIPTION OF ATTRIBUTES: Observation date: date of the event or observation; Type: name of the hazard; Name: name of the municipality; Municipality code: municipal code; State and certainty: as these are real events, the state is generally “current” and the certainty is generally “observed”; Urgency: the term “past” has generally been used for events that occurred before compilation work was carried out; Imprecision: imprecision in data (the date of the event, its location, the source of the data or no inaccuracy noted).**This third party metadata element was translated using an automated translation tool (Amazon Translate).**
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is composed of powerlifters who've competed in the United States at raw, full-power events from 2015 to now. All data was aggregated from OpenPowerlifting (see acknowledgments below) to create a smaller dataset with relevant American information. Use this dataset as a means to conduct EDA on American powerlifters and answer questions you may have on discrepancies among powerlifting community in the USA.
The file is composed of 38 columns of information such as basic lifter information, weights lifted, and competition location to select from. The following sub-sections provide a breakdown of each category in the dataset, as provided by OpenPowerlifting's README.txt document that accompanies downloading their dataset.
Mandatory. The name of the lifter in UTF-8 encoding.
Lifters who share the same name are distinguished by use of a # symbol followed by a unique number. For example, two lifters both named John Doe would have Name values John Doe #1 and John Doe #2 respectively.
Mandatory. The sex category in which the lifter competed, M, F, or Mx.
Mx (pronounced Muks) is a gender-neutral title — like Mr and Ms — originating from the UK. It is a catch-all sex category that is particularly appropriate for non-binary lifters.
Mandatory. The type of competition that the lifter entered. For the purposes of this dataset, all event values will be SBD for lifters who've competed in events testing their squat, bench, and deadlift
Mandatory. The equipment category under which the lifts were performed. For the purposes of this dataset, all values will be Raw, as it contains information about lifters who utilize minimal equipment
Optional. The age of the lifter on the start date of the meet, if known.
Ages can be one of two types: exact or approximate. Exact ages are given as integer numbers, for example 23. Approximate ages are given as an integer plus 0.5, for example 23.5.
Approximate ages mean that the lifter could be either of two possible ages. For an approximate age of n + 0.5, the possible ages are n or n+1. For example, a lifter with the given age 23.5 could be either 23 or 24 -- we don't have enough information to know.
Approximate ages occur because some federations only provide us with birth year information. So another way to think about approximate ages is that 23.5 implies that the lifter turns 24 that year.
Optional. The age class in which the filter falls, for example 40-45. These classes are based on exact age of the lifter on the day of competition.
AgeClass is mostly useful because sometimes a federation will report that a lifter competed in the 50-54 divison without providing any further age information. This way, we can still tag them as 50-54, even if the Age column is empty.
Optional. The birth year class in which the filter falls, for example 40-49. The ages in the range are the oldest possible ages for the lifter that year. For example, 40-49 means "the year the lifter turns 40 through the full year in which the lifter turns 49."
BirthYearClass is used primarily by the IPF and by IPF affiliates. Non-IPF federations tend to use AgeClass instead.
Optional. Free-form UTF-8 text describing the division of competition, like Open or Juniors 20-23 or Professional.
Some federations are configured in our database, which means that we have agreed on a limited set of division options for that federation, and we have rewritten their results to only use that set, and tests enforce that. Even still, divisions are not standardized between configured federations: it really is free-form text, just to provide context.
Information about age should not be extracted from the Division, but from the AgeClass column.
Optional. The recorded bodyweight of the lifter at the time of competition, to two decimal places.
Optional. The weight class in which the lifter competed, to two decimal places.
Weight classes can be specified as a maximum or as a minimum. Maximums are specified by just the number, for example 90 means "up to (and including) 90kg." minimums are specified by a + to the right of the number, for example 90+ means "above (and excluding) 90kg."
Optional. First attempts for each of squat, bench, and deadlift, respectively. Maximum of two decimal places.
Negative values indicate failed attempts.
Not all federations report attempt information. Some federations only report Best attempts.
Optional. Second attempts for each of squat, bench, and deadlift, respectively. Maximum of two decimal places.
Negative values indicate failed attempts.
Not all federations report attempt ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Graph theory provides a systematic method for modeling and analysing complicated biological data as an effective bioinformatics tool. Based on current trends, the number of DNA sequences in the DNA database is growing quickly. To determine the origin of a species and identify homologous sequences, it is crucial to detect similarities in DNA sequences. Alignment-free techniques are required for accurate measures of sequence similarity, which has been one of the main issues facing computational biologists. The current study provides a mathematical technique for comparing DNA sequences that are constructed in graph theory. The sequences of each DNA were divided into pairs of nucleotides, from which weighted loop digraphs and corresponding weighted vectors were computed. To check the sequence similarity, distance measures like Cosine, Correlation, and Jaccard were employed. To verify the method, DNA segments from the genomes of ten species of cotton were tested. Furthermore, to evaluate the efficacy of the proposed methodology, a K-means clustering method was performed. This study proposes a proof-of-model that utilises a distance matrix approach that promises impressive outcomes with future optimisations to be made to the suggested solution to get the hundred percent accurate result. In the realm of bioinformatics, this paper highlights the use of graph theory as an effective tool for biological data study and sequence comparison. It’s expected that further optimization in the proposed solution can bring remarkable results, as this paper presents a proof-of-concept implementation for a given set of data using the proposed distance matrix technique.
Facebook
TwitterAHAITD is a comprehensive benchmark dataset designed to support the evaluation of AI-generated text detection tools. The dataset contains 11,580 samples spanning both human-written and AI-generated content across multiple domains. It was developed to address limitations in previous datasets, particularly in terms of diversity, scale, and real-world applicability. To facilitate research in the detection of AI-generated text by providing a diverse, multi-domain dataset. This dataset enables fair benchmarking of detection tools across various writing styles and content categories. Composition 1. Human-Written Samples (Total: 5,790) Collected from:
Open Web Text (2,343 samples)
Blogs (196 samples)
Web Text (397 samples)
QA Platforms (670 samples)
News Articles (430 samples)
Opinion Statements (1,549 samples)
Scientific Research Abstracts (205 samples)
ChatGPT (1,130 samples)
GPT-4 (744 samples)
Paraphrase Models (1,694 samples)
GPT-2 (328 samples)
GPT-3 (296 samples)
DaVinci (GPT-3.5 variant) (433 samples)
GPT-3.5 (364 samples)
OPT-IML (406 samples)
Flan-T5 (395 samples)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
[Note: Integrated as part of FoodData Central, April 2019.] The USDA National Nutrient Database for Standard Reference (SR) is the major source of food composition data in the United States and provides the foundation for most food composition databases in the public and private sectors. This is the last release of the database in its current format. SR-Legacy will continue its preeminent role as a stand-alone food composition resource and will be available in the new modernized system currently under development. SR-Legacy contains data on 7,793 food items and up to 150 food components that were reported in SR28 (2015), with selected corrections and updates. This release supersedes all previous releases. Resources in this dataset:Resource Title: USDA National Nutrient Database for Standard Reference, Legacy Release. File Name: SR-Leg_DB.zipResource Description: Locally stored copy - The USDA National Nutrient Database for Standard Reference as a relational database using AcessResource Title: USDA National Nutrient Database for Standard Reference, Legacy Release. File Name: SR-Leg_ASC.zipResource Description: ASCII files containing the data of the USDA National Nutrient Database for Standard Reference, Legacy Release.Resource Title: USDA National Nutrient Database for Standard Reference, Legacy Release. File Name: SR-Leg_ASC.zipResource Description: Locally stored copy - ASCII files containing the data of the USDA National Nutrient Database for Standard Reference, Legacy Release.
Facebook
TwitterOur dynamic data offering is designed to provide a comprehensive view of over 108,000 publicly listed companies across the globe. This service is an essential tool for financial analysts, investors, corporate strategists, and market researchers, offering versatile data delivery options.
Key Features:
Rich Company Fundamentals: Access detailed profiles with financials, management information, operational metrics, and strategic insights. Historical Data Depth: Utilize our extensive historical data for trend analysis and benchmarking. Flexible Delivery Options: Bulk Data Access: Ideal for high-volume needs, get comprehensive data in bulk. Daily Updates: Stay current with daily data refreshes for timely and relevant insights. API Integration: Seamlessly integrate our data into your systems with our API, ensuring efficient data retrieval and analysis. Global News Integration: Get the latest news and updates, providing context and insights into market movements and company-specific events. Intuitive User Interface: Navigate our platform with ease for efficient data retrieval. Customizable Alerts and Reports: Stay informed with tailored alerts and custom reports. Expert Support: Rely on our dedicated support team for assistance and guidance. Benefits:
Enhance investment strategies with diverse and up-to-date data. Conduct in-depth market research and competitive analysis. Facilitate strategic planning and risk assessment with varied data access methods. Support academic research with a reliable data source. Ideal for:
Investment and Financial Firms Market Analysts and Economists Corporate Strategy and Business Development Teams Academic Researchers in Finance and Economics
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The National Institute of Food and Agriculture is committed to serving its stakeholders, Congress, and the public by using new technologies to advance greater openness. To strengthen transparency and promote open government, NIFA is providing easy access to data and metrics on how the agency disseminates funding. NIFA is committed to increasing transparency and making technical advancements to ensure that data is easily accessible. The Data Gateway provides the ability to filter and export data. Recently added features to the Congressional District Map and Data Gateway Search make for an improved user experience when searching and reporting information on NIFA-administered grants and projects! New interactive features in the Congressional District Map allow users to see the total amount of funding by state and further to drill down to the individual awards. Funding information is available for awards made from 2011-2015. Simply click on a state listing on the right of the screen. No need to create your own search if you are looking for NIFA funding by Congressional District. Key enhancements in the Data Gateway Search tool include:
A project-based display of data Embedded help text within tool Drop down lists allowing you to choose the fields you want to search and display Expanded filter lists
The Current Research Information System (CRIS) provides documentation and reporting for ongoing agricultural, food science, human nutrition, and forestry research, education and extension activities for the United States Department of Agriculture; with a focus on the National Institute of Food and Agriculture (NIFA) grant programs. Projects are conducted or sponsored by USDA research agencies, state agricultural experiment stations, land-grant universities, other cooperating state institutions, and participants in NIFA-administered grant programs, including Small Business Innovation Research and Agriculture and Food Research Initiative. The Planning, Accountability, & Reporting Staff office of NIFA is responsible for maintaining CRIS. Resources in this dataset:Resource Title: NIFA Reporting Portal. File Name: Web Page, url: https://portal.nifa.usda.gov Main html page for the database
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Most Indonesian datasets are "noisy"—mixing slang (gaul), misspellings, and formal words without distinction. IndoLeX is your "Ground Truth."
It is the only KBBI-validated frequency list based on a massive 110-million-word corpus. Unlike simple lists that treat makan (eat) and memakan (eating) as unrelated tokens, IndoLeX aggregates frequency by Root (Lemma), giving you deep semantic insight.
IndoLeX (Indonesian Lexical Dataset) targets the formal and academic domains, structured around 26,956 root words sourced from the official Indonesian dictionary (KBBI). Meticulously developed in mid-2025, it serves as the definitive baseline for NLP feature engineering, stopword filtering, and linguistic analysis.
Key Insight: The analysis reveals that the top 6,300 roots account for over 97% of all word usage in formal Indonesian contexts.
The dataset was created through a rigorous pipeline using Python 3.11:
Corpus Aggregation (250M Words Raw):
Cleaning & Segregation:
KBBI Validation (The Quality Filter):
Processing Stack:
NLTK, PySastrawi (Stemming), BeautifulSoup4.IndoLeX_Database.csv (The Deep Dive)Use this for NLP Normalization and Feature Engineering. It maps every variation to its root.
| Column Name | Data Type | Description |
|---|---|---|
word | String | The specific word form found in the corpus (e.g., perabaan). |
frequency | Integer | How often this specific form appeared. |
category | String | kbbi_direct (Root entry) or kbbi_derived (Morphological product). |
root | String | The Lemma (Root) of the word (e.g., aba). |
Definition | String | Official KBBI definition (contains HTML). |
IndoLeX_Root_Frequencies.csv (The Summary)Use this for Stopword Lists, Vocabulary Curricula, or Difficulty Scoring.
| Column Name | Data Type | Description |
|---|---|---|
rank | Integer | Frequency rank (1 = Most Common). |
word | String | The Root Word (Lemma). |
frequency | Integer | Total frequency of the root plus all its derivatives. |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveThis project aims to gain a thoroughly understanding of the characteristics and experiences of psychological mistreatment among older adults, acknowledging the diversity within this population. It also seeks to identify clinical tools and practices for its detection and intervention. While there is extensive literature on mistreatment of older adults, specific studies focusing on psychological aspects and intersecting social and identity dimensions are scarce. The findings will provide valuable insights for policymakers and healthcare professionals, helping to shape interventions and policies aimed at countering mistreatment in the ageing population.IntroductionPsychological mistreatment involves a range of behaviors, expressions, and gestures—or the lack of appropriate actions—that negatively impact an individual's health and dignity. Often subtle and difficult to detect, this type of mistreatment is prevalent and can coexist with other types of abuse. Examination of psychological mistreatment, shaped by various social and identity dimensions, is lacking in current research, particularly regarding how it is experienced by older adults. This scoping review seeks to map the current knowledge on psychological mistreatment of older adults, while highlighting gaps and future directions for research.Inclusion criteriaThis scoping review will encompass studies that explore the characteristics and experiences of psychological mistreatment among older adults, including their experiences and those of perpetrators and witnesses. It will also identify clinical tools and practices for the detection and intervention of psychological mistreatment in this population.MethodA scoping review will be undertaken by a multidisciplinary team, examining studies from post-2010, sourced from both bibliographic databases and grey literature, available in English or French. Employing an intersectional framework, the review will use Gender-Based Analysis Plus (GBA+) to examine how different forms of discrimination intersect and shape experiences of mistreatment. That is, this approach will help explore how social and identity dimensions—including gender, age, sexual orientation, ethnicity, socioeconomic status, and health conditions—shape the experiences and manifestations of psychological mistreatment.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Quantitative proteomics employing isobaric reagents has been established as a powerful tool for biological discovery. Current workflows often utilize a dedicated quantitative spectrum to improve quantitative accuracy and precision. A consequence of this approach is a dramatic reduction in the spectral acquisition rate, which necessitates the use of additional instrument time to achieve comprehensive proteomic depth. This work assesses the performance and benefits of online and real-time spectral identification in quantitative multiplexed workflows. A Real-Time Search (RTS) algorithm was implemented to identify fragment spectra within milliseconds as they are acquired using a probabilistic score and to trigger quantitative spectra only upon confident peptide identification. The RTS-MS3 was benchmarked against standard workflows using a complex two-proteome model of interference and a targeted 10-plex comparison of kinase abundance profiles. Applying the RTS-MS3 method provided the comprehensive characterization of a 10-plex proteome in 50% less acquisition time. These data indicate that the RTS-MS3 approach provides dramatic performance improvements for quantitative multiplexed experiments.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterDatabase of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.