This dataset was created by David King_Rutgers
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
David Sacks, the White House's AI and crypto czar, calls for bolstering BIS resources to enforce chip export controls, addressing concerns over China's access to US semiconductors.
The data is sourced from the NIBRS Group A Offense Crimes dataset and covers the period from January 1, 2020, to the end of the most recent complete month. The displayed number represents the total number of crimes against persons within the specified timeframe and sector.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "life-on-earth"
Dataset Summary
The David Attenborough Research Consortium (DARC) loves David Attenborough (DA). And therefore we aim to enrich his fantastic work using modern deep learning, generative artificial intelligence (AI) methods and most recent assistants like ChatGPT. Those results, together with extracted and time stamped image frames ("frame_00000_hh-mm-ss.msmsms.jpg", ...) from videos constitutes the darcai-life-on-earth dataset. As a first… See the full description on the dataset page: https://huggingface.co/datasets/mikehemberger/darcai-life-on-earth.
Lead in Drinking Water in Schools Test Results – David Wolfle Elementary School
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Human-AI Parallel English Corpus (HAP-E) 🙃
Purpose
The HAP-E corpus is designed for comparisions of the writing produced by humans and the writing produced by large language models (LLMs). The corpus was created by seeding an LLM with an approximately 500-word chunk of human-authored text and then prompting the model to produce an additional 500 words. Thus, a second 500-word chunk of human-authored text (what actually comes next in the original text) can be compared to… See the full description on the dataset page: https://huggingface.co/datasets/browndw/human-ai-parallel-corpus.
The H1B Sponsorship Trends linear chart shows the number of H1B cases filed by David Dale from 2020 to 2023, providing a clear view of filing trends over time. Alongside, the horizontal bar chart titled Distribution of Job Fields Receiving H1B Sponsorship breaks down which roles and industries are most commonly sponsored.
The PERM Sponsorship Trends linear chart visualizes the number of PERM cases filed by David Antonio from 2020 to 2023, highlighting the company’s long-term sponsorship patterns. The horizontal bar chart titled Distribution of Job Fields Receiving PERM Sponsorship further categorizes sponsored roles by job type.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Human-AI Parallel English Corpus Mini (HAP-E mini) 🙃
Purpose
This is a down-sampled version of the HAP-E corpus. Please read the HAP-E data card for detailed information about how the full corpus was created. This smaller version of the corpus was created to facilitate smaller scale explorations of the data (in classrooms, workshops, etc.). Note that in down-sampling the data, the parallel nature of the corpus was maintained. There is, for example, a text chunk for the… See the full description on the dataset page: https://huggingface.co/datasets/browndw/human-ai-parallel-corpus-mini.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore Artificial intelligence and intelligent systems : the implications through data • Key facts: author, publication date, book publisher, book series, book subjects • Real-time news, visualizations and datasets
Data has been processed by NODC to the NODC standard Bathythermograph (XBT) (C116) format. The C116/C118 format contains temperature-depth profile data obtained using expendable bathythermograph (XBT) instruments. Cruise information, position, date and time were reported for each observation. The data record was comprised of pairs of temperature-depth values. Unlike the MBT Data File, in which temperature values were recorded at uniform 5 m intervals, the XBT data files contained temperature values at non-uniform depths. These depths were recorded at the minimum number of points ("inflection points") required to accurately define the temperature curve. Standard XBTs can obtain profiles to depths of either 450 or 760 m. With special instruments, measurements can be obtained to 1830 m. Prior to July 1994, XBT data were routinely processed to one of these standard types. XBT data are now processed and loaded directly in to the NODC Ocean Profile Data Base (OPDB). Historic data from these two data types were loaded into the OPDB.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explore Nvidia's AI chips inspired by historical figures, showcasing cutting-edge technology and a nod to scientific pioneers.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
FIND is an interactive dataset for evaluating AI interpretability methods on black box functions.
This dataset contains all function files for the FIND benchmark and JSON files with associated metadata. The utilities provided in the associated FIND GitHub Repository support running and evaluating interpretation of the functions with user-defined interpreters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As large language models (LLMs) such as GPT have become more accessible, concerns about their potential effects on students’ learning have grown. In data science education, the specter of students’ turning to LLMs raises multiple issues, as writing is a means not just of conveying information but of developing their statistical reasoning. In our study, we engage with questions surrounding LLMs and their pedagogical impact by: 1) quantitatively and qualitatively describing how select LLMs write report introductions and complete data analysis reports; and 2) comparing patterns in texts authored by LLMs to those authored by students and by published researchers. Our results show distinct differences between machine-generated and human-generated writing, as well as between novice and expert writing. Those differences are evident in how writers manage information, modulate confidence, signal importance, and report statistics. The findings can help inform classroom instruction, whether that instruction is aimed at dissuading the use LLMs or at guiding their use as a productivity tool. It also has implications for students’ development as statistical thinkers and writers. What happens when they offload the work of data science to a model that doesn’t write quite like a data scientist?
The H1B Sponsorship Trends linear chart shows the number of H1B cases filed by David Ryuman from 2020 to 2023, providing a clear view of filing trends over time. Alongside, the horizontal bar chart titled Distribution of Job Fields Receiving H1B Sponsorship breaks down which roles and industries are most commonly sponsored.
The H1B Sponsorship Trends linear chart shows the number of H1B cases filed by David Yadegar from 2020 to 2023, providing a clear view of filing trends over time. Alongside, the horizontal bar chart titled Distribution of Job Fields Receiving H1B Sponsorship breaks down which roles and industries are most commonly sponsored.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This contains trained model weights, scalers, and evaluation metrics for the winter precipitation-type models trained as part of the paper "Evidential Deep Learning: Enhancing Predictive Uncertainty Estimation for Earth System Science Applications".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Counting the number of times a patient coughs per day is an essential biomarker in determining treatment efficacy for novel antitussive therapies and personalizing patient care. There is a need for wearable devices that employ multimodal sensors to perform accurate, privacy-preserving, automatic cough counting algorithms directly on the device in an edge-AI fashion. To advance this research field, we contribute the first publicly accessible cough counting dataset of multimodal biosignals. The database contains nearly 4 hours of biosignal data, with both acoustic and kinematic modalities, covering 4,300 annotated cough events. Furthermore, several non-cough sounds (i.e. breathing, laughing, and throat clearing), background noises (i.e. music, traffic, bystander coughing) and motion scenarios (i.e. sitting, walking) mimicking daily life activities are also present, which the research community can use to accelerate ML algorithm development.
For detailed information about using this dataset to train edge-AI models and example code, please refer to our public Git repository: https://github.com/esl-epfl/edge-ai-cough-count/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Britain risks falling behind in AI data centre development due to nuclear power constraints, as Nvidia's David Hogan warns. Explore the challenges and potential solutions.
Conductivity, temperature and Depth probe was used to collect data from NOAA Ship DAVID STARR JORDAN. The data were collected from NE Pacific (limit-180) over one month duration from August 20, 1969 to September 17, 1969 by National Marine Fisheries Service, La Jolla, CA. Data has been processed by NODC to the NODC standard High-Resolution CTD/STD (F022) format. The F022 format contains high-resolution data collected using CTD (conductivity-temperature-depth) and STD (salinity-temperature-depth) instruments. As they are lowered and raised in the oceans, these electronic devices provide nearly continuous profiles of temperature, salinity, and other parameters. Data values may be subject to averaging or filtering or obtained by interpolation and may be reported at depth intervals as fine as 1m. Cruise and instrument information, position, date, time and sampling interval are reported for each station. Environmental data at the time of the cast (meteorological and sea surface conditions) may also be reported. The data record comprises values of temperature, salinity or conductivity, density (computed sigma-t), and possibly dissolved oxygen or transmissivity at specified depth or pressure levels. Data may be reported at either equally or unequally spaced depth or pressure intervals. A text record is available for comments.
This dataset was created by David King_Rutgers