The RAFT benchmark (Realworld Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment.
RAFT is a few-shot classification benchmark that tests language models:
across multiple domains (lit reviews, medical data, tweets, customer interaction, etc.) on economically valuable classification tasks (someone inherently cares about the task) with evaluation that mirrors deployment (50 labeled examples per task, info retrieval allowed, hidden test set)
Description from: https://raft.elicit.org/
jjovalle99/raft-dataset-aws-wellarchitected dataset hosted on Hugging Face and contributed by the HF Datasets community
No Publication Abstract is Available
Dataset Card for EC-RAFT Raw ClinicalTrials.gov Dataset
Dataset Summary
This dataset provides a structured version of ClinicalTrials.gov data from , prepared for use in the EC-RAFT framework. It includes structured eligibility criteria (inclusion, exclusion, age, sex), trial descriptions, metadata, interventions, and study design fields. This dataset was used as the foundation for the paper:EC-RAFT: Automated Generation of Clinical Trial Eligibility Criteria through… See the full description on the dataset page: https://huggingface.co/datasets/biodatlab/ec-raft-dataset.
This raster file represents land within the Raft River Study Area classified as either “irrigated” with a cell value of 1 or “non-irrigated” with a cell value of 0 at a 10-meter spatial resolution. These classifications were determined at the pixel level by a Random Forest supervised machine learning methodology. Random Forest models are often used to classify large datasets accurately and efficiently by assigning each pixel to one of a pre-determined set of labels or groups. The model works by using decision trees that split the data based on characteristics that make the resulting groups as different from each other as possible. The model “learns” the characteristics that correlate to each label based on manually classified data points, also known as training data.A variety of data can be supplied as input to the Random Forest model for it to use in making its classification determinations. Irrigation produces distinct signals in observational data that can be identified by machine learning algorithms. Additionally, datasets that provide the model with information on landscape characteristics that often influence whether irrigation is present are also useful. This dataset was classified by the Random Forest model using United States Geological Survey (USGS) Landsat 8 and 9 Level 2, Collection 2, Tier 1 data, Harmonized Sentinel-2 Multispectral Instrument Level-2A data, USGS 3D Elevation Program (USGS 3DEP) data, and Height Above Nearest Drainage (HAND) data. Landsat 8, Landsat 9, and HAND data are at a 30-meter spatial resolution, and the Sentinel-2 and USGS 3DEP data are at a 10-meter spatial resolution. Sentinel-2 Normalized Difference Vegetation Index (NDVI) values and National Agriculture Imagery Program (NAIP) imagery from 2021 (the most recent available) were used to determine irrigation status for the manually classified training data points. Irrigated training point locations were first identified by the NAIP 2021 imagery. Those point locations were then used to sample all available Sentinel-2 NDVI images for the 2022 growing season, and the time series at each point location was reviewed. Only points whose NDVI values remained at or above 0.6 for the majority of the growing season retained their irrigation classification. All non-irrigated training points were reviewed with Sentinel-2 NDVI and false-color imagery to ensure no new crop fields had been established in those locations during the previous year.The final model results were manually reviewed prior to release, however, no extensive ground truthing process was implemented. A wetlands mask was applied using the U.S. Fish and Wildlife Service’s National Wetlands Inventory (FWS NWI) data for areas without overlapping irrigation POUs or locations manually determined to have potential irrigation. “Speckling”, or small areas of incorrectly classified pixels, was reduced by using the Boundary Clean smoothing tool in ArcGIS with a descending sorting type.
RAFT submissions for my-raft-submission
Submitting to the leaderboard
To make a submission to the leaderboard, there are three main steps:
Generate predictions on the unlabeled test set of each task Validate the predictions are compatible with the evaluation framework Push the predictions to the Hub!
See the instructions below for more details.
Rules
To prevent overfitting to the public leaderboard, we only evaluate one submission per week. You can push… See the full description on the dataset page: https://huggingface.co/datasets/Linuxdex/my-raft-submission.
This raster file represents land within the Raft River Study Area classified as either “irrigated” with a cell value of 1 or “non-irrigated” with a cell value of 0 at a 30-meter spatial resolution. These classifications were determined at the pixel level by a Random Forest supervised machine learning methodology. Random Forest models are often used to classify large datasets accurately and efficiently by assigning each pixel to one of a pre-determined set of labels or groups. The model works by using decision trees that split the data based on characteristics that make the resulting groups as different from each other as possible. The model “learns” the characteristics that correlate to each label based on manually classified data points, also known as training data. A variety of data can be supplied as input to the Random Forest model for it to use in making its classification determinations. Irrigation produces distinct signals in observational data that can be identified by machine learning algorithms. Additionally, datasets that provide the model with information on landscape characteristics that often influence whether irrigation is present are also useful. This dataset was classified by the Random Forest model using Level 2 (surface reflectance), Collection 2, Tier 1 data from Landsat 7 and Landsat 8, Mapping Evapotranspiration with Internalized Calibration (METRIC) data produced by IDWR, United States Geological Survey National Elevation Dataset (USGS NED) data, and Height Above Nearest Drainage (HAND) data. Landsat 7, Landsat 8, METRIC, and HAND data are at a 30-meter spatial resolution, and the USGS NED data are at a 10-meter spatial resolution. The Cropland Data Layer (CDL) from the United States Department of Agriculture (UDSA) National Agricultural Statistics Service (NASS), National Agriculture Imagery Program (NAIP) data from the USDA Farm Service Agency (FSA), Utah Water Related Land Use data from the Utah Division of Water Resources, and water rights data from IDWR were also used in determining irrigation status for the manually classified training data points but were not used for the machine learning model predictions. The final model results were manually reviewed prior to release, however, no extensive ground truthing process was implemented. “Speckling”, or small areas of incorrectly classified pixels, was reduced by masking all pixels with a slope value of 10% or greater as “non-irrigated”, regardless of the status they were assigned by the Random Forest model. Speckling within irrigated areas was reduced by a majority filter smoothing technique using a kernel of 8 nearest neighbors. A limited amount of manual corrections were also made to the final results.
phatvo/narrativeqa-test-raft dataset hosted on Hugging Face and contributed by the HF Datasets community
phatvo/narrativeqa-raft-50-p0.9 dataset hosted on Hugging Face and contributed by the HF Datasets community
raft-security-lab/robust-test-unsafe-prompts dataset hosted on Hugging Face and contributed by the HF Datasets community
This submission includes fact and logical data models for geothermal data concerning wells, fields, power plants and related analyses at Raft River, ID. The fact model is available in VizioModeler (native), html, UML, ORM-Specific, pdf, and as an XML Spy Project. An entity-relationship diagram is also included. Models are derived from tables, figures and other content in the following reports from the Raft River Geothermal Project: "Technical Report on the Raft River Geothermal Resource, Cassia County, Idaho," GeothermEx, Inc., August 2002. "Results from the Short-Term Well Testing Program at the Raft River Geothermal Field, Cassia County, Idaho," GeothermEx, Inc., October 2004.
This raster file represents land within the Raft River Study Area classified as either “irrigated” with a cell value of 1 or “non-irrigated” with a cell value of 0 at a 30-meter spatial resolution. These classifications were determined at the pixel level by a Random Forest supervised machine learning methodology. Random Forest models are often used to classify large datasets accurately and efficiently by assigning each pixel to one of a pre-determined set of labels or groups. The model works by using decision trees that split the data based on characteristics that make the resulting groups as different from each other as possible. The model “learns” the characteristics that correlate to each label based on manually classified data points, also known as training data.A variety of data can be supplied as input to the Random Forest model for it to use in making its classification determinations. Irrigation produces distinct signals in observational data that can be identified by machine learning algorithms. Additionally, datasets that provide the model with information on landscape characteristics that often influence whether irrigation is present are also useful. This dataset was classified by the Random Forest model using Level 2 (surface reflectance), Collection 2, Tier 1 data from Landsat 5 and Landsat 7, Mapping Evapotranspiration with Internalized Calibration (METRIC) data produced by IDWR, United States Geological Survey National Elevation Dataset (USGS NED) data, and Height Above Nearest Drainage (HAND) data. Landsat 5, Landsat 7, and HAND data are at a 30-meter spatial resolution, and the USGS NED data are at a 10-meter spatial resolution. The National Land Cover Dataset (NLCD) from the USGS, National Agriculture Imagery Program (NAIP) data from the USDA Farm Service Agency (FSA), Utah Water Related Land Use data from the Utah Division of Water Resources, Mapping Evapotranspiration with Internalized Calibration (METRIC) data (where available), and water rights data from IDWR were also used in determining irrigation status for the manually classified training data points but were not used for the machine learning model predictions. The final model results were manually reviewed prior to release, however, no extensive ground truthing process was implemented. “Speckling”, or small areas of incorrectly classified pixels, was reduced by masking all pixels with a slope value of 10% or greater as “non-irrigated”, regardless of the status they were assigned by the Random Forest model. Speckling within irrigated areas was reduced by a boundary clean smoothing technique.
phatvo/medhop-50-raft dataset hosted on Hugging Face and contributed by the HF Datasets community
https://www.imrmarketreports.com/privacy-policy/https://www.imrmarketreports.com/privacy-policy/
Report of Raft Fishing Reel is covering the summarized study of several factors encouraging the growth of the market such as market size, market type, major regions and end user applications. By using the report customer can recognize the several drivers that impact and govern the market. The report is describing the several types of Raft Fishing Reel Industry. Factors that are playing the major role for growth of specific type of product category and factors that are motivating the status of the market.
phatvo/THUDM_webglm-qa-train-raft dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset was generated to determine a 2017 water budget. The boundary of the study area extends from Idaho into a portion of Utah.This layer depicts polygons representing land within the Raft River Study area boundary classified as either "irrigated", "non-irrigated" or "semi-irrigated", where the semi-irrigated classification typically depicts residential land. Neither Irrigation status nor line work were verified by ground truthing. Field boundaries were refined using the 2017 Idaho National Agriculture Imagery Program (NAIP) imagery, Digital Ortho Photo Quadrangle (DOQQ) imagery, or other high resolution imagery. Attribute assignments for irrigation status (irrigated, non-irrigated, and semi-irrigated) are determined using available Landsat and/or Sentinel satellite imagery as background reference. Landsat imagery is typically 30-meter (Landsat5) or 15-meter (Landsat7) resolution. Sentinel imagery is 10-meter resolution. National Agriculture Inventory Program (NAIP) imagery, Digital Ortho Photo Quadrangle (DOQQ) imagery, and other in-house, scanned aerial imagery is used for determining irrigation status and refining the polygon geometry. The interpretation and classification process is described in detail in the report, "2006 Irrigated Land Classification for the Eastern Snake Plain Aquifer" archived on the IDWR website: Legal Actions > Delivery Call Actions > SWC > Archived Matters > Technical Working Group Documents (https://idwr.idaho.gov/legal-actions/delivery-call-actions/SWC/archived-matters.html#twg-documents).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
10816 Global import shipment records of Raft from United States with prices, volume & current Buyer’s suppliers relationships based on actual Global import trade database.
phatvo/newsqa-raft-100-p0.9 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Lipid rafts are dynamic membrane microdomains that orchestrate molecular interactions and are implicated in cancer development. To understand the functions of lipid rafts in cancer, we performed an integrated analysis of quantitative lipid raft proteomics data sets modeling progression in breast cancer, melanoma, and renal cell carcinoma. This analysis revealed that cancer development is associated with increased membrane raft–cytoskeleton interactions, with ∼40% of elevated lipid raft proteins being cytoskeletal components. Previous studies suggest a potential functional role for the raft–cytoskeleton in the action of the putative tumor suppressors PTRF/Cavin-1 and Merlin. To extend the observation, we examined lipid raft proteome modulation by an unrelated tumor suppressor opioid binding protein cell-adhesion molecule (OPCML) in ovarian cancer SKOV3 cells. In agreement with the other model systems, quantitative proteomics revealed that 39% of OPCML-depleted lipid raft proteins are cytoskeletal components, with microfilaments and intermediate filaments specifically down-regulated. Furthermore, protein–protein interaction network and simulation analysis showed significantly higher interactions among cancer raft proteins compared with general human raft proteins. Collectively, these results suggest increased cytoskeleton-mediated stabilization of lipid raft domains with greater molecular interactions as a common, functional, and reversible feature of cancer cells.
phatvo/hotpotqa-raft-1k dataset hosted on Hugging Face and contributed by the HF Datasets community
The RAFT benchmark (Realworld Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment.
RAFT is a few-shot classification benchmark that tests language models:
across multiple domains (lit reviews, medical data, tweets, customer interaction, etc.) on economically valuable classification tasks (someone inherently cares about the task) with evaluation that mirrors deployment (50 labeled examples per task, info retrieval allowed, hidden test set)
Description from: https://raft.elicit.org/