Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
CSV formatted dataset for the ARC-AGI challenge, as the original dataset was in json format.
Both the files are formatted as Id, Input, Output. Id contains the id of the task and train or test label with it position in the task. Input contains the input. Output contains the output.
Facebook
TwitterThis dataset was created by Ahmad Fakhar
Facebook
TwitterThe following datafiles contain detailed information about vehicles in the UK, which would be too large to use as structured tables. They are provided as simple CSV text files that should be easier to use digitally.
Data tables containing aggregated information about vehicles in the UK are also available.
We welcome any feedback on the structure of our new datafiles, their usability, or any suggestions for improvements, please contact vehicles statistics.
CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).
When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.
df_VEH0120_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077520/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 37.6 MB)
Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)
Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]
df_VEH0120_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077521/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 20.8 MB)
Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)
Schema: BodyType, Make, GenModel, Model, LicenceStatus, [number of vehicles; one column per quarter]
df_VEH0160_GB: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077522/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 17.1 MB)
Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)
Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]
df_VEH0160_UK: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077523/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 4.93 MB)
Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)
Schema: BodyType, Make, GenModel, Model, [number of vehicles; one column per quarter]
df_VEH0124: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1077524/df_VEH0124.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model, model, year of first use and year of manufacture: United Kingdom (CSV, 28.2 MB)
Scope: All licensed vehicles in the United Kingdom; 2021 Quarter 4 (end December) only
Schema: BodyType, Make, GenModel, Model, YearFirstUsed, YearManufacture, Licensed (number of vehicles), SORN (number of vehicles)
df_VEH0220: <a class="govu
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LifeSnaps Dataset Documentation
Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.
The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.
Data Import: Reading CSV
For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.
Data Import: Setting up a MongoDB (Recommended)
To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.
To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.
For the Fitbit data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
For the SEMA data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c sema
For surveys data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c surveys
If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.
Data Availability
The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:
{
_id:
Facebook
TwitterThis data release provides data in support of an assessment of water quality and discharge in the Herring River at the Chequessett Neck Road dike in Wellfleet, Massachusetts, from November 2015 to September 2017. The assessment was a cooperative project among the U.S. Geological Survey, National Park Service, Cape Cod National Seashore, and the Friends of Herring River to characterize environmental conditions prior to a future removal of the dike. It is described in U.S. Geological Survey (USGS) Scientific Investigations Report "Assessment of Water Quality and Discharge in the Herring River, Wellfleet, Massachusetts, November 2015 – September 2017." This data release is structured as a set of comma-separated values (CSV) files, each of which contains information on data source (or laboratory used for analysis), USGS site identification (ID) number, beginning date of time of observation or sampling, ending date and time of observation or sampling and data such as flow rate and analytical results. The CSV files include calculated tidal daily flows (Flood_Tide_Tidal_Day.csv and Ebb_Tide_Tidal_Day.csv) that were used in Huntington and others (2020) for estimation of nutrient loads. Tidal daily flows are the estimated mean daily discharges for two consecutive flood and ebb tide cycles (average duration: 24 hours, 48 minutes). The associated date is the day on which most of the flow occurred. CSV files contain quality assurance data for water-quality samples including blanks (Blanks.csv), replicates (Replicates.csv), standard reference materials (Standard_Reference_Material.csv), and atmospheric ammonium contamination (NH4_Atmospheric_Contamination.csv). One CSV file (EWI_vs_ISCO.csv) contains data comparing composite samples collected by an automatic sampler (ISCO) at a fixed point with depth-integrated samples collected at equal width increments (EWI). One CSV file (Cross_Section_Field_Parameters.csv) contains field parameter data (specific conductance, temperature, pH, and dissolved oxygen) collected at a fixed location and data collected along the cross sections at variable water depths and horizontal distances across the openings of the culverts at the Chequessett Neck Road dike. One CSV file (LOADEST_Bias_Statistics.csv) contains data that include estimated natural log of load, model residuals, Z-scores, and seasonal model residuals for winter (December, January, and February); spring (March, April and May); summer (June, July and August); and fall (September, October, and November). The data release also includes a data dictionary (Data_Dictionary.csv) that provides detailed descriptions of each field in each CSV file, including: data filename; laboratory or data source; U.S. Geological Survey site ID numbers; data types; constituent (analyte) U.S. Geological Survey parameter codes; descriptions of parameters; units; methods; minimum reporting limits; limits of quantitation, if appropriate; method reference citations; and minimum, maximum, median, and average values for each analyte. The data release also includes an abbreviations file (Abbreviations.pdf) that defines all the abbreviations in the data dictionary and CSV files. Note that the USGS site ID includes a leading zero (011058798) and some of the parameter codes contain leading zeros, so care must be taken in opening and subsequently saving these files in other formats where leading zeros may be dropped.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This publication contains several datasets that have been used in the paper "Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal" submitted to the 17th International Conference on Scientometrics and Bibliometrics (ISSI 2019), available at https://opencitations.wordpress.com/2019/02/07/crowdsourcing-open-citations-with-croci/.
Additional information about the analyses described in the paper, including the code and the data we have used to compute all the figures, is available as a Jupyter notebook at https://github.com/sosgang/pushing-open-citations-issi2019/blob/master/script/croci_nb.ipynb. The datasets contain the following information.
non_open.zip: it is a zipped (~5 GB unzipped) CSV file containing the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, dated October 2018. All the entity types retrieved from Crossref were aligned to one of following five categories: journal, book, proceedings, dataset, other. The open CC0 citation data we used came from the CSV dump of most recent release of COCI dated 12 November 2018. The number of closed citations was calculated by subtracting the number of open citations to each entity available within COCI from the value “is-referenced-by-count” available in the Crossref metadata for that particular cited entity, which reports all the DOI-to-DOI citation links that point to the cited entity from within the whole Crossref database (including those present in the Crossref ‘closed’ dataset).
The columns of the CSV file are the following ones:
doi: the DOI of the publication in Crossref;
type: the type of the publication as indicated in Crossref;
cited_by: the number of open citations received by the publication according to COCI;
non_open: the number of closed citations received by the publication according to Crossref + COCI.
croci_types.csv: it is a CSV file that contains the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, as collected in the previous CSV file, alligned in five classes depening on the entity types retrieved from Crossref: journal (Crossref types: journal-article, journal-issue, journal-volume, journal), book (Crossref types: book, book-chapter, book-section, monograph, book track, book-part, book-set, reference-book, dissertation, book series, edited book), proceedings (Crossref types: proceedings-article, proceedings, proceedings-series), dataset (Crossref types: dataset), other (Crossref types: other, report, peer review, reference-entry, component, report-series, standard, posted-content, standard-series).
The columns of the CSV file are the following ones:
type: the type publication between "journal", "book", "proceedings", "dataset", "other";
label: the label assigned to the type for visualisation purposes;
coci_open_cit: the number of open citations received by the publication type according to COCI;
crossref_close_cit: the number of closed citations received by the publication according to Crossref + COCI.
publishers_cits.csv: it is a CSV file that contains the top twenty publishers that received the greatest number of open citations. The columns of the CSV file are the following ones:
publisher: the name of the publisher;
doi_prefix: the list of DOI prefixes used assigned by the publisher;
coci_open_cit: the number of open citations received by the publications of the publisher according to COCI;
crossref_close_cit: the number of closed citations received by the publications of the publishers according to Crossref + COCI;
total_cit: the total number of citations received by the publications of the publisher (= coci_open_cit + crossref_close_cit).
20publishers_cr.csv: it is a CSV file that contains the numbers of the contributions to open citations made by the twenty publishers introduced in the previous CSV file as of 24 January 2018, according to the data available through the Crossref API. The counts listed in this file refers to the number of publications for which each publisher has submitted metadata to Crossref that include the publication’s reference list. The categories 'closed', 'limited' and 'open' refer to publications for which the reference lists are not visible to anyone outside the Crossref Cited-by membership, are visible only to them and to Crossref Metadata Plus members, or are visible to all, respectively. In addition, the file also record the total number of publications for which the publisher has submitted metadata to Crossref, whether or not those metadata include the reference lists of those publications.
The columns of the CSV file are the following ones:
publisher: the name of the publisher;
open: the number of publications in Crossref with an 'open' visibility for their reference lists;
limited: the number of publications in Crossref with an 'limited' visibility for their reference lists;
closed: the number of publications in Crossref with an 'closed' visibility for their reference lists;
overall_deposited: the overall number of publications for which the publisher has submitted metadata to Crossref.
Facebook
TwitterPrecipitation, volumetric soil-water content, videos, and geophone data characterizing postfire debris flows were collected at the 2022 Hermit’s Peak Calf-Canyon Fire in New Mexico. This dataset contains data from June 22, 2022, to June 26, 2024. The data were obtained from a station located at 35° 42’ 28.86” N, 105° 27’ 18.03” W (geographic coordinate system). Each data type is described below. Raw Rainfall Data: Rainfall data, Rainfall.csv, are contained in a comma separated value (.csv) file. The data are continuous and sampled at 1-minute intervals. The columns in the csv file are TIMESTAMP(UTC), RainSlowInt (the depth of rain in each minute [mm]), CumRain (cumulative rainfall since the beginning of the record [mm]), and VWC# (volumetric water content [V/V]) at three depths (1 = 10 cm, 2=30 cm, and 3=50 cm). VWC values outside of the range of 0 to 0.5 represent sensor malfunctions and were replaced with -99999 . Storm Record: We summarized the rainfall, volumetric soil-water content, and geophone data based on rainstorms. We defined a storm as rain for a duration >= 5 minutes or with an accumulation > 2.54 mm. Each storm was then assigned a storm ID starting at 0. The storm record data, StormRecord.csv, provides peak rainfall intensities and times and volumetric soil-water content information for each storm. The columns from left to right provide the information as follows: ID, StormStart yyyy-mm-dd hh:mm:ss-tz, StormStop yyyy-mm-dd hh:mm:ss-tz, StormDepth mm, StormDuration h, I-5 mm h-1, I-10 mm h-1, I-15 mm h-1, I-30 mm h-1, I-60 mm h-1, I-5 time yyyy-mm-dd hh:mm:ss-tz, I-10 time yyyy-mm-dd hh:mm:ss-tz, I-15 time yyyy-mm-dd hh:mm:ss-tz] ([UTC], the time of the peak 15-minute rainfall intensity), I-30 time yyyy-mm-dd hh:mm:ss-tz] ] ([UTC], the time of the peak 30-minute rainfall intensity), I-60 time [yyyy-mm-dd hh:mm:ss-tz] [UTC], (the time of the peak 60-minute rainfall intensity), VWC (volumetric water content [V/V] at three depths (1 = 10 cm, 2 = 30 cm, 3 = 50 cm) at the start of the storm, the time of the peak 15-minute rainfall intensity, and the end of the storm), Velocity [m s-1] of the flow, and Event (qualitative observation of type of flow from video footage). VWC values outside of the range of 0 to 0.5 represent sensor malfunctions and were replaced with -99999. Velocity was only calculated for flows with a noticeable surge as the rest of the signal is not sufficient for a cross-correlation, and Event was only filled for storms with quality video data. Values of -99999 were assigned for these columns for all other storms. Geophone Data: Geophone data, GeophoneData.zip, are contained in comma separated value (.csv) files labeled by ‘storm’ and the corresponding storm ID in the storm record and labeled IDa and IDb if the geophone stopped recording for more than an hour during the storm. The data was recorded at two geophones sampled at 50 Hz, one 11.5 m upstream from the station and one 9.75 m downstream from the station. Geophones were triggered to record when 1.6 mm of rain was detected during a period of 10 minutes, and they continued to record for 30 minutes past the last timestamp when this criteria was met. The columns in each csv file are TIMESTAMP [UTC], GeophoneUp_mV (the upstream geophone [mV]), GeophoneDn_mV (the downstream geophone [mV]). Note that there are occasional missed samples when the data logger did not record due to geophone malfunction when data points are 0.04 s or more apart. Videos: The videos stormID_mmdd.mp4 (or .mov) are organized by storm ID where one folder contains data for one storm. Within folders for each storm, videos are labeled by the timestamp in UTC of the end of the video as IMGPhhmm. Some videos in the early mornings or late evenings, or in very intense rainfall, have had brightness and contrast adjustments in Adobe Premiere Pro for better video quality and are in MP4 format. All raw videos are in MOV format. The camera triggered when a minimum of 1.6 mm of rain fell in a 10-minute interval and it recorded in 16-minute video clips until it was 30 minutes since the last trigger. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Facebook
TwitterThis volume's release consists of 325099 media files captured by autonomous wildlife monitoring devices under the project, USDA White Mountain National Forest. The attached files listed below include several CSV files that provide information about the data release. The file, "media.csv" provides the metadata about the media, such as filename and date/time of capture. The actual media files are housed within folders under the volume's "child items" as compressed files. A critical CSV file is "dictionary.csv", which describes each CSV file, including field names, data types, descriptions, and the relationship of each field to fields in other CSV files. Some of the media files may have been "tagged" or "annotated" by either humans or by machine learning models, identifying wildlife targets within the media. If so, this information is stored in "annotations.csv" and "modeloutputs.csv", respectively. To protect privacy, all personally identifiable information (PII) have been removed, locations have been "blurred" by bounding boxes, and media featuring sensitive taxa or humans have been omitted. To enhance data reuse, the sbRehydrate() function in the AMMonitor R package will download files and re-create the original AMMonitor project (database + media files). See source code at https://code.usgs.gov/vtcfwru/ammonitor.
Facebook
TwitterThis volume's release consists of 143321 media files captured by autonomous wildlife monitoring devices under the project, Massachusetts Wildlife Monitoring Project. The attached files listed below include several CSV files that provide information about the data release. The file, "media.csv" provides the metadata about the media, such as filename and date/time of capture. The actual media files are housed within folders under the volume's "child items" as compressed files. A critical CSV file is "dictionary.csv", which describes each CSV file, including field names, data types, descriptions, and the relationship of each field to fields in other CSV files. Some of the media files may have been "tagged" or "annotated" by either humans or by machine learning models, identifying wildlife targets within the media. If so, this information is stored in "annotations.csv" and "modeloutputs.csv", respectively. To protect privacy, all personally identifiable information (PII) have been removed, locations have been "blurred" by bounding boxes, and media featuring sensitive taxa or humans have been omitted. To enhance data reuse, the sbRehydrate() function in the AMMonitor R package will download files and re-create the original AMMonitor project (database + media files). See source code at https://code.usgs.gov/vtcfwru/ammonitor.
Facebook
TwitterThe anion data for the East River Watershed, Colorado, consists of fluoride, chloride, sulfate, nitrate, and phosphate concentrations collected at multiple, long-term monitoring sites that include stream, groundwater, and spring sampling locations. These locations represent important and/or unique end-member locations for which solute concentrations can be diagnostic of the connection between terrestrial and aquatic systems. Such locations include drainages underlined entirely or largely by shale bedrock, land covered dominated by conifers, aspens, or meadows, and drainages impacted by historic mining activity and the presence of naturally mineralized rock. Developing a long-term record of solute concentrations from a diversity of environments is a critical component of quantifying the impacts of both climate change and discrete climate perturbations, such as drought, forest mortality, and wildfire, on the riverine export of multiple anionic species. Such data may be combined with stream gauging stations co-located at each monitoring site to directly quantify the seasonal and annual mass flux of these anionic species out of the watershed. This data package contains (1) a zip file (anion_data_2014-2022.zip) containing a total of 345 data files of anion data from across the Lawrence Berkeley National Laboratory (LBNL) Watershed Function Scientific Focus Area (SFA) which is reported in .csv files per location; (2) a file-level metadata (flmd.csv) file that lists each file contained in the dataset with associated metadata; and (3) a data dictionary (dd.csv) file that contains terms/column_headers used throughout the files along with a definition, units, and data type. Update on 6/10/2022: versioned updates to this dataset was made along with these changes: (1) updated anion data for all locations up to 2021-12-31, (2) removal of units from column headers in datafiles, (3) added row underneath headers to contain units of variables, (4) restructure of units to comply with CSV reporting format requirements, and (5) the addition of the file-level metadata (flmd.csv) and data dictionary (dd.csv) were added to comply with the File-Level Metadata Reporting Format. Update on 2022-09-09: Updates were made to reporting format specific files (file-level metadata and data dictionary) to correct swapped file names, add additional details on metadata descriptions on both files, add a header_row column to enable parsing, and add version number and date to file names (v2_20220909_flmd.csv and v2_20220909_dd.csv).Update on 2022-12-20: Updates were made to both the data files and reporting format specific files. Conversion issues affecting ER-PLM locations for anion data was resolved for the data files. Additionally, the flmd and dd files were updated to reflect the updated versions of these files. Available data was added up until 2022-03-14.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides a comprehensive list of 567 file extensions along with their descriptions, meticulously scraped from a Wikipedia page. It serves as a valuable resource for developers, researchers, and anyone interested in understanding various file types and their purposes.
The dataset contains the following columns: - File Extension: The extension of the file (e.g., .txt, .jpg). - Description: A brief description of what the file extension is used for.
This dataset can be used for various purposes, including: - Building applications that need to recognize and handle different file types. - Educating and training individuals on file extensions and their uses. - Conducting research on file formats and their prevalence in different domains.
File Extensions, Data Description, CSV, Web Scraping, Beautiful Soup, Wikipedia, Data Analysis, Development, Research
| File Extension | Description |
|---|---|
| .txt | Plain text file |
| .jpg | JPEG image file |
| Portable Document Format file | |
| .doc | Microsoft Word document file |
| .xlsx | Microsoft Excel spreadsheet file |
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The DIAMAS project investigates Institutional Publishing Service Providers (IPSP) in the broadest sense, with a special focus on those publishing initiatives that do not charge fees to authors or readers. To collect information on Institutional Publishing in the ERA, a survey was conducted among IPSPs between March-May 2024. This dataset contains aggregated data from the 685 valid responses to the DIAMAS survey on Institutional Publishing.
The dataset supplements D2.3 Final IPSP landscape Report Institutional Publishing in the ERA: results from the DIAMAS survey.
The data
Basic aggregate tabular data
Full individual survey responses are not being shared to prevent the easy identification of respondents (in line with conditions set out in the survey questionnaire). This dataset contains full tables with aggregate data for all questions from the survey, with the exception of free-text responses, from all 685 survey respondents. This includes, per question, overall totals and percentages for the answers given as well the breakdown by both IPSP-types: institutional publishers (IPs) and service providers (SPs). Tables at country level have not been shared, as cell values often turned out to be too low to prevent potential identification of respondents. The data is available in csv and docx formats, with csv files grouped and packaged into ZIP files. Metadata describing data type, question type, as well as question response rate, is available in csv format. The R code used to generate the aggregate tables is made available as well.
Files included in this dataset
survey_questions_data_description.csv - metadata describing data type, question type, as well as question response rate per survey question.
tables_raw_all.zip - raw tables (csv format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option. Zip file contains 180 csv files.
tables_raw_IP.zip - as tables_raw_all.zip, for responses from institutional publishers (IP) only. Zip file contains 180 csv files.
tables_raw_SP.zip - as tables_raw_all.zip, for responses from service providers (SP) only. Zip file contains 170 csv files.
tables_formatted_all.docx - formatted tables (docx format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option.
tables_formatted_IP.docx - as tables_formatted_all.docx, for responses from institutional publishers (IP) only.
tables_formatted_SP.docx - as tables_formatted_all.docx, for responses from service providers (SP) only.
DIAMAS_Tables_single.R - R script used to generate raw tables with aggregated data for all single response questions
DIAMAS_Tables_multiple.R - R script used to generate raw tables with aggregated data for all multiple response questions
DIAMAS_Tables_layout.R - R script used to generate document with formatted tables from raw tables with aggregated data
DIAMAS Survey on Instititutional Publishing - data availability statement (pdf)
All data are made available under a CC0 license.
Facebook
TwitterThis volume's release consists of 320104 media files captured by autonomous wildlife monitoring devices under the project, Maine Department of Inland Fisheries and Wildlife. The attached files listed below include several CSV files that provide information about the data release. The file, "media.csv" provides the metadata about the media, such as filename and date/time of capture. The actual media files are housed within folders under the volume's "child items" as compressed files. A critical CSV file is "dictionary.csv", which describes each CSV file, including field names, data types, descriptions, and the relationship of each field to fields in other CSV files. Some of the media files may have been "tagged" or "annotated" by either humans or by machine learning models, identifying wildlife targets within the media. If so, this information is stored in "annotations.csv" and "modeloutputs.csv", respectively. To protect privacy, all personally identifiable information (PII) have been removed, locations have been "blurred" by bounding boxes, and media featuring sensitive taxa or humans have been omitted. To enhance data reuse, the sbRehydrate() function in the AMMonitor R package will download files and re-create the original AMMonitor project (database + media files). See source code at https://code.usgs.gov/vtcfwru/ammonitor.
Facebook
TwitterThis volume's release consists of 26141 media files captured by autonomous wildlife monitoring devices under the project, Indiana Dunes National Park. The attached files listed below include several CSV files that provide information about the data release. The file, "media.csv" provides the metadata about the media, such as filename and date/time of capture. The actual media files are housed within folders under the volume's "child items" as compressed files. A critical CSV file is "dictionary.csv", which describes each CSV file, including field names, data types, descriptions, and the relationship of each field to fields other CSV files. Some of the media files may have been "tagged" or "annotated" by either humans or by machine learning models, identifying wildlife targets within the media. If so, this information is stored in "annotations.csv" and "modeloutputs.csv", respectively. To protect privacy, all personally identifiable information (PII) have been removed, locations have been "blurred" by bounding boxes, and media featuring sensitive taxa or humans have been omitted. To enhance data reuse, the sbRehydrate() function in the AMMonitor R package will download files and re-create the original AMMonitor project (database + media files). See source code at https://code.usgs.gov/vtcfwru/ammonitor.
Facebook
TwitterThis volume's release consists of 64642 media files captured by autonomous wildlife monitoring devices under the project, Maine Department of Inland Fisheries and Wildlife. The attached files listed below include several CSV files that provide information about the data release. The file, "media.csv" provides the metadata about the media, such as filename and date/time of capture. The actual media files are housed within folders under the volume's "child items" as compressed files. A critical CSV file is "dictionary.csv", which describes each CSV file, including field names, data types, descriptions, and the relationship of each field to fields other CSV files. Some of the media files may have been "tagged" or "annotated" by either humans or by machine learning models, identifying wildlife targets within the media. If so, this information is stored in "annotations.csv" and "modeloutputs.csv", respectively. To protect privacy, all personally identifiable information (PII) have been removed, locations have been "blurred" by bounding boxes, and media featuring sensitive taxa or humans have been omitted. To enhance data reuse, the sbRehydrate() function in the AMMonitor R package will download files and re-create the original AMMonitor project (database + media files). See source code at https://code.usgs.gov/vtcfwru/ammonitor.
Facebook
TwitterThis volume's release consists of 463615 media files captured by autonomous wildlife monitoring devices under the project, New Hampshire Fish and Game Department. The attached files listed below include several CSV files that provide information about the data release. The file, "media.csv" provides the metadata about the media, such as filename and date/time of capture. The actual media files are housed within folders under the volume's "child items" as compressed files. A critical CSV file is "dictionary.csv", which describes each CSV file, including field names, data types, descriptions, and the relationship of each field to fields in other CSV files. Some of the media files may have been "tagged" or "annotated" by either humans or by machine learning models, identifying wildlife targets within the media. If so, this information is stored in "annotations.csv" and "modeloutputs.csv", respectively. To protect privacy, all personally identifiable information (PII) have been removed, locations have been "blurred" by bounding boxes, and media featuring sensitive taxa or humans have been omitted. To enhance data reuse, the sbRehydrate() function in the AMMonitor R package will download files and re-create the original AMMonitor project (database + media files). See source code at https://code.usgs.gov/vtcfwru/ammonitor.
Facebook
TwitterThe NGEE-Arctic research team identified a common set of hierarchical plant functional types (PFTs) for pan-arctic vegetation that we will use across our research activities. Interdisciplinary work within a large team requires agreement regarding levels of functional organization so that knowledge, data, and technologies can be shared and combined effectively. The team has identified plant functional types as a crucial area where such interoperability is needed. PFTs are used to represent plant pools and fluxes within models, summarize observational data, and map vegetation across the landscape. Within each of these applications, varying levels of PFT specificity are needed according to the specific scientific research goal, computational limitations, and data availability. By agreeing on a specific hierarchical framework for grouping variables in our vegetation data, we ensure the resulting research products will be robust, flexible, and scalable. In this document, we lay out the agreed upon PFT framework with definitions and references to existing literature. Table 1 included in the "NGA700_Phase4PFTFramework_about*" file outlines the relationship between NGEE-Arctic Phase 4, Tier 1 PFTs and the PFTs used within prominent arctic literature as well as publications by the NGEE-Arctic team during phases 1-3. This dataset consists of a table detailing a hierarchical PFT framework that spans 4 tiers with the most granular PFTs listed in tier 1 and the most general PFTs in tier 4. The PFTs within each tier has a single column in the dataset where the PFTs are named and a separate column where the characteristics used to define that PFT are listed. Grey fill of the cells is used to indicate where a given PFT starts to “lose” tier 1 details as you look from left to right. Note the excel file has merged cells to indicate grouping of PFTs across the Tiers- it will not translate into a delimited filetype (.csv, .txt, etc) without modification thus the hierarchical PFT framework table is available in three different file formats: 1) NGA700_Phase4PTS.xlsx – maintains the merged cells and grey fill; 2) NGA700_Phase4PTS.csv – merged cells are split, and grey fill is removed; 3) NGA700_Phase4PTS.pdf – image of the table with merged cells and grey fill. Metadata document included as a *.pdf and file-level metadata and data dictionary as *.csv files.
Facebook
TwitterProvide the renewable energy generation amounts by renewable energy system type. The CSV file contains the renewable energy generation amounts from solar photovoltaic systems and wind power systems respectively.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The /kaggle/input/online-review-csv/online_review.csv file contains customer reviews from Flipkart. It includes the following columns:
review_id: Unique identifier for each review. product_id: Unique identifier for each product. user_id: Unique identifier for each user. rating: Star rating (1 to 5) given by the user. title: Summary of the review. review_text: Detailed feedback from the user. review_date: Date the review was submitted. verified_purchase: Indicates if the purchase was verified (true/false). helpful_votes: Number of users who found the review helpful. reviewer_name: Name or alias of the reviewer. Uses Sentiment Analysis: Understand customer sentiments. Product Improvement: Identify areas for product enhancement. Market Research: Analyze customer preferences. Recommendation Systems: Improve recommendation algorithms. This dataset is ideal for practicing data analysis and machine learning techniques.
Facebook
TwitterThis submission contains an update to the previous Exploration Gap Assessment funded in 2012, which identify high potential hydrothermal areas where critical data are needed (gap analysis on exploration data).
The uploaded data are contained in two data files for each data category: A shape (SHP) file containing the grid, and a data file (CSV) containing the individual layers that intersected with the grid. This CSV can be joined with the map to retrieve a list of datasets that are available at any given site. A grid of the contiguous U.S. was created with 88,000 10-km by 10-km grid cells, and each cell was populated with the status of data availability corresponding to five data types:
The attributes in the CSV include:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
CSV formatted dataset for the ARC-AGI challenge, as the original dataset was in json format.
Both the files are formatted as Id, Input, Output. Id contains the id of the task and train or test label with it position in the task. Input contains the input. Output contains the output.