Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: Sanity check: Dexscreener FDV and Liquidity aren't found but pool is resolved
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: NFT Trades VeeFriends Sanity Check
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: usdc/cbbtc fee function calls, single day sanity check
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: aerodrome swap event v. getswapfee calls sanity check
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
In this hackathon, the goal is to create a machine learning model that extracts entity values from images. This capability is crucial in fields like healthcare, e-commerce, and content moderation, where precise product information is vital. As digital marketplaces expand, many products lack detailed textual descriptions, making it essential to obtain key details directly from images. These images provide important information such as weight, volume, voltage, wattage, dimensions, and many more, which are critical for digital stores.
The dataset consists of the following columns: **1. index: **A unique identifier (ID) for the data sample. **2. image_link: **Public URL where the product image is available for download. Example link- https://m.media-amazon.com/images/I/71XfHPR36-L.jpg To download images, use the download_images function from src/utils.py. See sample code in src/test.ipynb.
3. group_id: Category code of the product. **4. entity_name: **Product entity name. For example, “item_weight”. 5. entity_value: Product entity value. For example, “34 gram”. Note: For test.csv, you will not see the column entity_value as it is the target variable.
1. index: The unique identifier (ID) of the data sample. Note that the index should match the test record index. 2. prediction: A string which should have the following format: “x unit” where x is a float number in standard formatting and unit is one of the allowed units (allowed units are mentioned in the Appendix). The two values should be concatenated and have a space between them. For example: “2 gram”, “12.5 centimetre”, “2.56 ounce” are valid. Invalid cases: “2 gms”, “60 ounce/1.7 kilogram”, “2.2e2 kilogram”, etc. Note: Make sure to output a prediction for all indices. If no value is found in the image for any test sample, return an empty string, i.e., “”. If you have less/more number of output samples in the output file as compared to test.csv, your output won’t be evaluated.
Source Files: 1. src/sanity.py: Sanity checker to ensure that the final output file passes all formatting checks. Note: The script will not check if fewer/more number of predictions are present compared to the test file. See sample code in src/test.ipynb. **2. src/utils.py: **Contains helper functions for downloading images from the image_link. 3. src/constants.py: Contains the allowed units for each entity type. 4. sample_code.py: A sample dummy code that can generate an output file in the given format. Usage of this file is optional.
1. dataset/train.csv: Training file with labels (entity_value). 2. dataset/test.csv: Test file without output labels (entity_value). Generate predictions using your model/solution on this file's data and format the output file to match sample_test_out.csv (Refer to the "Output Format" section above). 3. dataset/sample_test.csv: Sample test input file. 4. dataset/sample_test_out.csv: Sample outputs for sample_test.csv. The output for test.csv must be formatted in the exact same way. Note: The predictions in the file might not be correct.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
This is a tiny version of the RedPajama dataset. It contains 64 samples from each of the 7 sources. This dataset is intended for developing and testing data/training pipeline for loading the full RedPajama dataset or any general HuggingFace dataset. It is very fast to download and easy to examine. You should not use it for training a full model, but you can use it for overfitting test or any other sanity checks.… See the full description on the dataset page: https://huggingface.co/datasets/severo/RedPajama-Tiny.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: Sanity check: Dexscreener FDV and Liquidity aren't found but pool is resolved