Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Code: https://github.com/GAIR-NLP/MegaScience Project Page: https://huggingface.co/MegaScience MegaScience is a large-scale mixture of high-quality open-source datasets consisting of 1.25 million instances. We first collect multiple public datasets, then conduct comprehensive ablation studies across different data selection methods to identify the optimal approach for each dataset, thereby… See the full description on the dataset page: https://huggingface.co/datasets/MegaScience/MegaScience.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Dataset Description
Scientific reasoning is critical for developing AI scientists and supporting human researchers in advancing the frontiers of natural science discovery. However, the open-source community has primarily focused on mathematics and coding while neglecting the scientific domain, largely due to the absence of open, large-scale, high-quality, verifiable scientific reasoning… See the full description on the dataset page: https://huggingface.co/datasets/MegaScience/TextbookReasoning.
seba/MegaScience-Qwen3-Tokenized dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Credit report of Mega Science Co Ltd contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.
GBIF, the Global Biodiversity Information Facility, is an international network and data infrastructure funded by the world's governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth. Coordinated through its Secretariat in Copenhagen, the GBIF network of participating countries and organizations, working through participant nodes, provides data-holding institutions around the world with common standards and open-source tools that enable them to share information about where and when species have been recorded. This knowledge derives from many sources, including everything from museum specimens collected in the 18th and 19th century to geotagged smartphone photos shared by amateur naturalists in recent days and weeks. The GBIF network draws all these sources together through the use of data standards, such as Darwin Core, which forms the basis for the bulk of GBIF.org's index of hundreds of millions of species occurrence records. Publishers provide open access to their datasets using machine-readable Creative Commons licence designations, allowing scientists, researchers and others to apply the data in hundreds of peer-reviewed publications and policy papers each year. Many of these analyses, which cover topics from the impacts of climate change and the spread of invasive and alien pests to priorities for conservation and protected areas, food security and human health, would not be possible without this. GBIF arose from a 1999 recommendation by the Biodiversity Informatics Subgroup of the Organization for Economic Cooperation and Development's Megascience Forum. This report concluded that "An international mechanism is needed to make biodiversity data and information accessible worldwide", arguing that this mechanism could produce many economic and social benefits and enable sustainable development by providing sound scientific evidence.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Code: https://github.com/GAIR-NLP/MegaScience Project Page: https://huggingface.co/MegaScience MegaScience is a large-scale mixture of high-quality open-source datasets consisting of 1.25 million instances. We first collect multiple public datasets, then conduct comprehensive ablation studies across different data selection methods to identify the optimal approach for each dataset, thereby… See the full description on the dataset page: https://huggingface.co/datasets/MegaScience/MegaScience.