Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LS NANO 281R lab 3 kaggle.json file.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset can be recreated with two notebook code blocks, and updated with new Hugging Face data as well. You will need your API key to replace "fill-me-in" and a blank Kaggle Dataset like "FAISS-SentenceTransformers-AIMO" was before running the two cells.
Cell Block 1: !pip install -q kaggle
import os
!echo '{"username":"thomasgamet","key":"fill-me-in"}' > /root/.kaggle/kaggle.json !chmod 600 /root/.kaggle/kaggle.json
import os from kaggle.api.kaggle_api_extended import KaggleApi import json
os.makedirs('/kaggle/working/packages', exist_ok=True)
!pip download -d /kaggle/working/packages sentence-transformers faiss-cpu
api = KaggleApi() api.authenticate()
Cell Block 2: dataset_metadata = { "title": "FAISS-SentenceTransformers-AIMO", "id": "dataset/thomasgamet/faiss-sentencetransformers-aimo", "licenses": [{"name": "apache-2.0"}] }
with open('/kaggle/working/packages/dataset-metadata.json', 'w') as f: json.dump(dataset_metadata, f, indent=4)
api.dataset_create_version('/kaggle/working/packages', version_notes="Initial version", dir_mode='tar')
Used by: https://www.kaggle.com/code/thomasgamet/updated-code-interpretation-rag-based-1shot-shared/edit
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by ahmed938ali
Released under CC0: Public Domain
This dataset was created by Dalix56
This dataset was created by Mukesh Maji
This dataset was created by Shu Murase
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for DeepGlobe/7-class segmentation of RGB 512x512 high-res. images
These Residual-UNet model data are based on the DeepGlobe dataset
Models have been created using Segmentation Gym* using the following dataset**: https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset
Image size used by model: 512 x 512 x 3 pixels
classes: 1. urban 2. agricultural 3. rangeland 4. forest 5. water 6. bare 7. unknown
File descriptions
For each model, there are 5 files with the same root name:
'.json' config file: this is the file that was used by Segmentation Gym* to create the weights file. It contains instructions for how to make the model and the data it used, as well as instructions for how to use the model for prediction. It is a handy wee thing and mastering it means mastering the entire Doodleverse.
'.h5' weights file: this is the file that was created by the Segmentation Gym* function train_model.py
. It contains the trained model's parameter weights. It can called by the Segmentation Gym* function seg_images_in_folder.py
. Models may be ensembled.
'_modelcard.json' model card file: this is a json file containing fields that collectively describe the model origins, training choices, and dataset that the model is based upon. There is some redundancy between this file and the config
file (described above) that contains the instructions for the model training and implementation. The model card file is not used by the program but is important metadata so it is important to keep with the other files that collectively make the model and is such is considered part of the model
'_model_history.npz' model training history file: this numpy archive file contains numpy arrays describing the training and validation losses and metrics. It is created by the Segmentation Gym function train_model.py
'.png' model training loss and mean IoU plot: this png file contains plots of training and validation losses and mean IoU scores during model training. A subset of data inside the .npz file. It is created by the Segmentation Gym function train_model.py
Additionally, BEST_MODEL.txt contains the name of the model with the best validation loss and mean IoU
References *Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
**Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D. and Raskar, R., 2018. Deepglobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 172-181).
This dataset was created by God Abeg
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Sahal Mulki
Released under MIT
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Aman Kumar Jha
Released under Apache 2.0
This dataset was created by huyendao123
This dataset was created by Adarsha Pratap Adhikari
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Yelp Dataset JSON Each file is composed of a single object type, one JSON-object per-line.
Take a look at some examples to get you started: https://github.com/Yelp/dataset-examples.
Note: the follow examples contain inline comments, which are technically not valid JSON. This is done here to simplify the documentation and explaining the structure, the JSON files you download will not contain any comments and will be fully valid JSON.
business.json Contains business data including location data, attributes, and categories.
{ // string, 22 character unique string business id "business_id": "tnhfDv5Il8EaGSXZGiuQGg",
// string, the business's name
"name": "Garaje",
// string, the full address of the business
"address": "475 3rd St",
// string, the city
"city": "San Francisco",
// string, 2 character state code, if applicable
"state": "CA",
// string, the postal code
"postal code": "94107",
// float, latitude
"latitude": 37.7817529521,
// float, longitude
"longitude": -122.39612197,
// float, star rating, rounded to half-stars
"stars": 4.5,
// integer, number of reviews
"review_count": 1198,
// integer, 0 or 1 for closed or open, respectively
"is_open": 1,
// object, business attributes to values. note: some attribute values might be objects
"attributes": {
"RestaurantsTakeOut": true,
"BusinessParking": {
"garage": false,
"street": true,
"validated": false,
"lot": false,
"valet": false
},
},
// an array of strings of business categories
"categories": [
"Mexican",
"Burgers",
"Gastropubs"
],
// an object of key day to value hours, hours are using a 24hr clock
"hours": {
"Monday": "10:00-21:00",
"Tuesday": "10:00-21:00",
"Friday": "10:00-21:00",
"Wednesday": "10:00-21:00",
"Thursday": "10:00-21:00",
"Sunday": "11:00-18:00",
"Saturday": "10:00-21:00"
}
} review.json Contains full review text data including the user_id that wrote the review and the business_id the review is written for.
{ // string, 22 character unique review id "review_id": "zdSx_SD6obEhz9VrW9uAWA",
// string, 22 character unique user id, maps to the user in user.json
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",
// string, 22 character business id, maps to business in business.json
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
// integer, star rating
"stars": 4,
// string, date formatted YYYY-MM-DD
"date": "2016-03-09",
// string, the review itself
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",
// integer, number of useful votes received
"useful": 0,
// integer, number of funny votes received
"funny": 0,
// integer, number of cool votes received
"cool": 0
} user.json User data including the user's friend mapping and all the metadata associated with the user.
{ // string, 22 character unique user id, maps to the user in user.json "user_id": "Ha3iJu77CxlrFm-vQRs_8g",
// string, the user's first name
"name": "Sebastien",
// integer, the number of reviews they've written
"review_count": 56,
// string, when the user joined Yelp, formatted like YYYY-MM-DD
"yelping_since": "2011-01-01",
// array of strings, an array of the user's friend as user_ids
"friends": [
"wqoXYLWmpkEH0YvTmHBsJQ",
"KUXLLiJGrjtSsapmxmpvTA",
"6e9rJKQC3n0RSKyHLViL-Q"
],
// integer, number of useful votes sent by the user
"useful": 21,
// integer, number of funny votes sent by the user
"funny": 88,
// integer, number of cool votes sent by the user
"cool": 15,
// integer, number of fans the user has
"fans": 1032,
// array of integers, the years the user was elite
"elite": [
2012,
2013
],
// float, average rating of all reviews
"average_stars": 4.31,
// integer, number of hot compliments received by the user
"compliment_hot": 339,
// integer, number of more compliments received by the user
"compliment_more": 668,
// integer, number of profile compliments received by the user
"compliment_profile": 42,
// integer, number of cute compliments received by the user
"compliment_cute": 62,
// integer, number of list compliments received by the user
"compliment_list": 37,
// integer, number of note compliments received by the user
"compliment_note": 356,
// integer, number of plain compliments received by the user
"compliment_plain": 68,
// integer, number of coo...
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Dhrumil Patel
Released under CC0: Public Domain
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Sujithkumar M
Released under Apache 2.0
This dataset was created by qiudong
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
JSON file that can be imported into some XBRL-based financial report creation tools that then converts the information into the XBRL global standard format. These tools support this format: Auditchain Suite, see Auditchain Suite; General Luca, see General Luca.
For more information about the SFAC6, see this XBRL-based report model. SFAC6 Model.
This dataset was created by Lex Wayne
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is a dataset of step-by-step instructions extracted from wikiHow and represented in JSON format. This dataset contains 132754 articles (step-by-step instructions), containing 9.21 steps each, on average.
For more information on this type of data, see previous versions of this dataset on github, datahub and kaggle.
This dataset consists of 26 JSON files, each one containing a set of JSON objects representing an instructional article. This is the description of the fields of all the objects:
MainTask: The title of the main task.
URL: The URL of the article.
Time: Timestamp of when the article was viewed.
Views: The number of views on the page.
AuthorsCount: The number of authors that edited the page.
MainTaskSummary: A summary description of the main task.
Steps: If the article has no methods or parts, this is the list of step objects.
Methods: The list of methods (if any), each one having its own list of step objects.
Parts: The list of parts (if any), each one having its own list of step objects.
Categories: The categories this article belongs to, from generic to specific.
Ingredients: The list of ingredients (if any).
Requirements: The list of things needed (if any).
Tips: The list of tips (if any).
QnA: The list of QnA objects (if any).
MethodName: The name of the method.
Steps: The list of step objects for this method.
PartName: The name of the part.
Steps: The list of step objects for this part
Question: The question.
Answer: The answer.
Headline: The first, bold-emphasised, sentence describing the step.
Description: The complete/detailed description of the step (if any) that follows the headline. Links: A list of HTML links present in the step.
This dataset was created by Kirti Sikka
Released under Other (specified in description)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LS NANO 281R lab 3 kaggle.json file.