100+ datasets found

d
GLobal Ocean Data Analysis Project (GLODAP) version 1.1 (NCEI Accession...
catalog.data.gov
data.cnra.ca.gov
+3more
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2025). GLobal Ocean Data Analysis Project (GLODAP) version 1.1 (NCEI Accession 0001644) [Dataset]. https://catalog.data.gov/dataset/global-ocean-data-analysis-project-glodap-version-1-1-ncei-accession-00016441
Explore at:
Dataset updated
Jul 1, 2025
Dataset provided by
(Point of Contact)
Description
The GLobal Ocean Data Analysis Project (GLODAP) is a cooperative effort to coordinate global synthesis projects funded through NOAA/DOE and NSF as part of the Joint Global Ocean Flux Study - Synthesis and Modeling Project (JGOFS-SMP). Cruises conducted as part of the World Ocean Circulation Experiment (WOCE), Joint Global Ocean Flux Study (JGOFS) and NOAA Ocean-Atmosphere Exchange Study (OACES) over the decade of the 1990s have created an oceanographic database of unparalleled quality and quantity. These data provide an important asset to the scientific community investigating carbon cycling in the oceans.
f
Data_Sheet_1_Advanced large language models and visualization tools for data...
frontiersin.figshare.com
txt
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1418006.s001
Dataset updated
Aug 8, 2024
Dataset provided by
Frontiers
Authors
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.
N
Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Gate, OK...
neilsberg.com
Updated Aug 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Gate, OK Household Incomes Across 4 Age Groups and 16 Income Brackets. Annual Editions Collection // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2ecf2ac3-aeee-11ee-aaca-3860777c1fe6/
Explore at:
Dataset updated
Aug 7, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Gate
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Gate household income by age. The dataset can be utilized to understand the age-based income distribution of Gate income.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Gate, OK annual median income by age groups dataset (in 2022 inflation-adjusted dollars)

Age-wise distribution of Gate, OK household incomes: Comparative analysis across 16 income brackets

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Gate income distribution by age. You can refer the same here
d
Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...
catalog.data.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
Plans to implement big data and analytics projects in 2015
statista.com
Updated Sep 3, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2015). Plans to implement big data and analytics projects in 2015 [Dataset]. https://www.statista.com/statistics/489530/big-data-and-analytics-enterprise-project-plans/
Explore at:
Dataset updated
Sep 3, 2015
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2015
Area covered
United Kingdom, Australia, Worldwide, New Zealand
Description
This statistic shows the results of a survey on data-driven projects, either planned or implemented, among technology magazine readers. In 2015, 27 percent of respondents indicated that their companies had already deployed or implemented a data-driven project. Fewer than a third of respondents said their companies had no plans to deploy or implement a data-driven project.
SHARP - Shape Analysis Research Project
catalog.data.gov
s.cnmilf.com
+1more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). SHARP - Shape Analysis Research Project [Dataset]. https://catalog.data.gov/dataset/sharp-shape-analysis-research-project-9c966
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
We have applied 3D shape-based retrieval to various disciplines such as computer vision, CAD/CAM, computer graphics, molecular biology and 3D anthropometry. We have organized two workshops on 3D shape retrieval and two shape retrieval contests. We also have developed 3D shape benchmarks, performance evaluation software and prototype 3D retrieval systems. We have developed a robotic map quality assessment tool in collaboration with MEL) We also have developed different shape descriptors to represent 3D human bodies and heads efficiently and other work related to 3D anthropometry. Finally, we also have done some in a Structural Bioinformatics, Bio-Image analysis and retrieval.
Lending Club Loan Data Analysis - Deep Learning
kaggle.com
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deependra Verma (2023). Lending Club Loan Data Analysis - Deep Learning [Dataset]. https://www.kaggle.com/datasets/deependraverma13/lending-club-loan-data-analysis-deep-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Deependra Verma
Description
DESCRIPTION

Create a model that predicts whether or not a loan will be default using the historical data.

Problem Statement:

For companies like Lending Club correctly predicting whether or not a loan will be a default is very important. In this project, using the historical data from 2007 to 2015, you have to build a deep learning model to predict the chance of default for future loans. As you will see later this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.

Domain: Finance

Analysis to be done: Perform data preprocessing and build a deep learning prediction model.

Content:

Dataset columns and definition:

credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.

purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other").

int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.

installment: The monthly installments owed by the borrower if the loan is funded.

log.annual.inc: The natural log of the self-reported annual income of the borrower.

dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).

fico: The FICO credit score of the borrower.

days.with.cr.line: The number of days the borrower has had a credit line.

revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).

revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).

inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months.

delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.

pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).

Steps to perform:

Perform exploratory data analysis and feature engineering and then apply feature engineering. Follow up with a deep learning model to predict whether or not the loan will be default using the historical data.

Tasks:

Feature Transformation

Transform categorical values into numerical values (discrete)

Exploratory data analysis of different factors of the dataset.

Additional Feature Engineering

You will check the correlation between features and will drop those features which have a strong correlation

This will help reduce the number of features and will leave you with the most relevant features

Modeling

After applying EDA and feature engineering, you are now ready to build the predictive models

In this part, you will create a deep learning model using Keras with Tensorflow backend
m
An experiment on the reliability analysis of megaproject sustainability
data.mendeley.com
narcis.nl
Updated Jan 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhen Chen (2021). An experiment on the reliability analysis of megaproject sustainability [Dataset]. http://doi.org/10.17632/gy2h2ybtjg.1
Explore at:
Unique identifier
https://doi.org/10.17632/gy2h2ybtjg.1
Dataset updated
Jan 5, 2021
Authors
Zhen Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hypothesis: The reliability can be adopted to quantitatively measure the sustainability of mega-projects.

Presentation: This dataset shows two scenario based examples to establish an initial reliability assessment of megaproject sustainability. Data were gathered from the author’s assumption with regard to assumed differences between scenarios A and B. There are two sheets in this Microsoft Excel file, including a comparison between two scenarios by using a Fault Tree Analysis model, and a correlation analysis between reliability and unavailability.

Notable findings: It has been found from this exploratory experiment that the reliability can be used to quantitatively measure megaproject sustainability, and there is a negative correlation between reliability and unavailability among 11 related events in association with sustainability goals in the life-cycle of megaproject.

Interpretation: Results from data analysis by using the two sheets can be useful to inform decision making on megaproject sustainability. For example, the reliability to achieve sustainability goals can be enhanced by decrease the unavailability or the failure at individual work stages in megaproject delivery.

Implication: This dataset file can be used to perform reliability analysis in other experiment to access megaproject sustainability.
A
‘Maddison Project Dataset 2020 Population by Region’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Maddison Project Dataset 2020 Population by Region’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-maddison-project-dataset-2020-population-by-region-bbc7/c1d5af57/?iid=001-934&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Maddison Project Dataset 2020 Population by Region’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathurinache/maddison-project-dataset-2020-population-by-region on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

The Maddison Project Database provides information on comparative economic growth and income levels over the very long run. The 2020 version of this database covers 169 countries and the period up to 2018. For questions not covered in the documentation, please contact maddison@rug.nl.

Content

We now offer a new 2020 update of the Maddison Project database, which uses a different methodology compared to the 2018 update. The approach of the 2018 update is identical to that of Penn World Tables, and consistent with recent economic and statistical research in this field. However, applying this approach systematically results in historical outcomes that are not consistent with current insights by economic historians, as explained in Bolt and Van Zanden (2020).

The 2020 update has to some extent gone back to the original Maddison approach to remedy for this (see documentation). Both the 2018 and the 2020 datasets incorporate the available recent work by economic historians on long term economic growth, the 2020 is most complete in this respect.

Acknowledgements

Attribution requirement -

All original papers must be cited when:

the data is shown in any graphical form subsets of the full dataset that include less than a dozen (12) countries are used for statistical analysis or any other purposes

A list of original papers can be found in the source sheet of the database. When neither a) or b) apply, then the MPD as a whole should be cited.

Maddison Project Database, version 2020. Bolt, Jutta and Jan Luiten van Zanden (2020), “Maddison style estimates of the evolution of the world economy. A new 2020 update ”.

Inspiration

You can find some inspiration here : https://ourworldindata.org/global-economic-inequality-introduction

--- Original source retains full ownership of the source dataset ---
App Review Text Analysis Project
kaggle.com
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjana Murthy (2024). App Review Text Analysis Project [Dataset]. https://www.kaggle.com/datasets/sanjanamurthy392/app-review-text-analysis-project
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sanjana Murthy
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
About Datasets:

Domain : Marketing Project: App Review Text Analysis Dataset: linkedin-reviews Dataset Type: Excel Data Dataset Size: 1k+ records

KPI's: 1. Distribution of Ratings 2. Distribution of Review Lengths 3. Distribution of Sentiments 4. Sentiment Distribution across Ratings

Process: 1. Understanding the problem 2. Data Collection 3. Perform EDA by analyzing the length of the reviews and their ratings 4. Label the sentiment data using tools like Textblob or NLTK 5. Understanding the overall distribution of sentiments (positive, negative, neutral) in the dataset 6. Explore the relationship between the sentiments and the ratings given 7. Analyze the text of the reviews to identify common themes or words in different sentiment categories. 8. Interpreting the results

This data contains pandas, matplotlib, seaborn, countplot, histplot, textblob.sentiment, sentiment.polarity, value_counts, barplot, hue, .join, wordcloud, imshow, interpolation, figsize, generate_word_cloud(sentiment)
EMEC Wildlife Data Analysis Project - Cleansed Data
dtechtive.com
find.data.gov.scot
zip
Updated Jan 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marine Scotland (2020). EMEC Wildlife Data Analysis Project - Cleansed Data [Dataset]. https://dtechtive.com/datasets/19784
Explore at:
zip(4.8759 MB), zip(1.2569 MB), zip(0.0296 MB), zip(0.0665 MB)Available download formats
Dataset updated
Jan 7, 2020
Dataset provided by
Marine Directoratehttps://www.gov.scot/about/how-government-is-run/directorates/marine-scotland/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
European Marine Energy Centre (EMEC) wildlife observation data has been collected from the marine renewable test sites at Billia Croo and Fall of Warness in Orkney. These data have been processed and cleansed and used in reports prepared by EMEC, for SNH and Marine Scotland.
c
Milky Way Project First Data Release IR Bubble Catalog
s.cnmilf.com
data.nasa.gov
+1more
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
High Energy Astrophysics Science Archive Research Center (2025). Milky Way Project First Data Release IR Bubble Catalog [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/milky-way-project-first-data-release-ir-bubble-catalog
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
High Energy Astrophysics Science Archive Research Center
Description
This table contains a new catalog of 5106 infrared bubbles created through visual classification via the online citizen science website 'The Milky Way Project' (MWP). Bubbles in the new catalog have been independently measured by at least five individuals, producing consensus parameters for their positions, radii, thicknesses, eccentricities and position angles. Citizen scientists - volunteers recruited online and taking part in this research - have independently rediscovered the locations of at least 86% of three widely used catalogs of bubbles and H II regions while finding an order of magnitude more objects. 29% of the bubbles in the Milky Way Project catalog lie on the rim of a larger bubble, or have smaller bubbles located within them, opening up the possibility of better statistical studies of triggered star formation. This online resource of the Milky Way Project provides a crowd-sourced map of bubbles and arcs in the Milky Way, and will enable better statistical analysis of Galactic star formation sites. This table is the first data release of the MWP IR Bubble Catalog: the authors anticipate a future release of a second, refined catalog incorporating better data-reduction techniques. This table was created by the HEASARC in March 2013 based on the CDS Catalog J/MNRAS/424/2442 files mwplarge.dat and mwpsmall.dat. This is a service provided by NASA HEASARC .
W
Data from: Coalbed Methane Geostatistical Analysis Project: Oak Grove,...
cloud.csiss.gmu.edu
data.wu.ac.at
html
Updated Aug 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Energy Data Exchange (2019). Coalbed Methane Geostatistical Analysis Project: Oak Grove, Alabama. Final report [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/coalbed-methane-geostatistical-analysis-project-oak-grove-alabama-final-report
Explore at:
htmlAvailable download formats
Dataset updated
Aug 8, 2019
Dataset provided by
Energy Data Exchange
Area covered
Alabama
Description
The purpose of the coalbed methane geostatistical study was to identify correlations between geologic parameters and gas production for wells completed in the Oak Grove field in Alabama. The study area consisted of 23 wells in the primary degasification grid and 15 wells adjacent to the grid. All data obtained from reports from lineament maps were screened for consistency and accuracy prior to any analyses being performed. The primary analysis of the data consisted of multivariate statistical procedures. The intent of the analysis was to provide information on what effects various geologic and lineament variables had on gas production. It was also intended that the variables having the most important effects on gas production be identified and used for predicting production categories. Principal components analysis was used to establish the well groupings. Results indicated that wells could be grouped based upon two primary criteria: (1) the well's overall production and (2) a comparison of the early production of a well relative to the production realized later in its life. In effect, four unique groupings could be formed based upon the production profiles of the wells. Results of stepwise discriminant analysis, in which the four production categories were examined, indicated that four variables were considered to be important: well elevation, number of lineament intersections within 250 feet, thickness of the Blue Creek coal seam, and the length of the nearest lineament. A means by which a well could be classified into one of the production categories based upon measurements of the four more important variables was also developed. Results of the classification of the wells inside the degasification grid were encouraging, with 91% of the wells being correctly classified. When the classification methods developed based upon wells inside the primary degasification grid were applied to the 15 wells outside the grid, the results were not as encouraging.
A
‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’...
analyst-2.ai
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-porotomo-project-brady-s-geothermal-field-subtask-3-5-gps-data-analysis-33f0/5075b4bf/?iid=002-865&v=presentation
Explore at:
Dataset updated
Jan 26, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/c6e77e20-1cab-4053-894a-b0edcb1df117 on 26 January 2022.

--- Dataset description provided by original source is as follows ---

Links to GPS RINEX data not previously reported, plus links to station web pages, which include most up-to-date time-series

--- Original source retains full ownership of the source dataset ---
d
Smart City Challenge Finalists Project Proposals - Calibration Data
catalog.data.gov
data.virginia.gov
+3more
Updated Mar 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDOT (2025). Smart City Challenge Finalists Project Proposals - Calibration Data [Dataset]. https://catalog.data.gov/dataset/smart-city-challenge-finalists-project-proposals-calibration-data
Explore at:
Dataset updated
Mar 16, 2025
Dataset provided by
USDOT
Description
Analysis of the projects proposed by the seven finalists to USDOT's Smart City Challenge, including challenge addressed, proposed project category, and project description. The time reported for the speed profiles are between 2:00PM to 8:00PM in increments of 10 minutes.
E
Global Ocean Data Analysis Project for Carbon (GLODAP)
tds.marine.rutgers.edu
erddap.emodnet-physics.eu
Updated Mar 29, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
COPERNICUS MARINE (2012). Global Ocean Data Analysis Project for Carbon (GLODAP) [Dataset]. https://tds.marine.rutgers.edu/erddap/info/GLODAP_REP_OBSERVATIONS/index.html
Explore at:
Dataset updated
Mar 29, 2012
Dataset authored and provided by
COPERNICUS MARINE
Time period covered
Apr 2, 1981 - Mar 29, 2012
Area covered

Variables measured
ALKW, CORG, CPHL, DOX2, NODW, NT1D, NTAW, NTIW, PH25, PHOW, and 24 more
Description
Global Ocean Data Analysis Project for Carbon (GLODAP) _NCProperties=version=1|netcdflibversion=4.6.1|hdf5libversion=1.8.12 cdm_data_type=Profile cdm_profile_variables=time,latitude,longitude citation=These data were collected and made freely available by the Copernicus project and the programs that contribute to it. Cite as Olsen, A.; Key, R. M.; van Heuven, S.; Lauvset, S. K.; Velo, A.; Lin, X.; Schirnick, C.; Kozyr, A.; Tanhua, T.; Hoppema, M.; Jutterström, S.; Steinfeldt, R.; Jeansson, E.; Ishii, M.; Pérez, F. F.; and T. Suzuki, T. (2016).The Global Ocean Data Analysis Project version 2 (GLODAPv2) – an internally consistent data product for the world ocean, Earth Syst. Sci. Data, 8, 297-323, https://doi.org/10.5194/essd-8-297-2016 Conventions=CF-1.6, COARDS, ACDD-1.3 data_type=OceanSITES vertical profile Easternmost_Easting=-60.4437 featureType=Profile geospatial_lat_max=44.6973 geospatial_lat_min=32.93 geospatial_lat_units=degrees_north geospatial_lon_max=-60.4437 geospatial_lon_min=-76.36 geospatial_lon_units=degrees_east geospatial_vertical_max=4986.0 geospatial_vertical_min=0.0 geospatial_vertical_positive=down geospatial_vertical_units=m history=Created by Eli Hunter (hunter@marine.rutgers.edu),25-Mar-2020 16:16:27 infoUrl=http://marine.copernicus.eu institution=COPERNICUS MARINE keywords_vocabulary=GCMD Science Keywords Northernmost_Northing=44.6973 references=http://marine.copernicus.eu https://www.glodap.info sourceUrl=Myocean Southernmost_Northing=32.93 standard_name_vocabulary=CF Standard Name Table v55 subsetVariables=time time_coverage_end=2012-03-29T20:12:11Z time_coverage_start=1981-04-02T00:00:00Z Type=GLODAP Observation File: DOPPIO DOMAIN Westernmost_Easting=-76.36
Data from: Multinational Companies: Qualitative Comparative Analysis for...
beta.ukdataservice.ac.uk
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UK Data Service (2023). Multinational Companies: Qualitative Comparative Analysis for Conditions for Social Innovation and Semi-structured Interviews Examining Social Innovation, 2020-2022 [Dataset]. http://doi.org/10.5255/ukda-sn-856444
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-856444
Dataset updated
2023
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Description
The research was envisaged to last 30 months. One of the consequences of the pandemic was that initial access in early 2020 was challenging and we sought an extension to 36 months. Hence the project began in early 2020 and ran up till the end of 2022. There were two phases to the project. Phase one entailed a Qualitative Comparative Analysis (QAC) to analyse conditions of 10 cases of social innovation in and around MNCs. Phase 2 consisted of semi-structured interviews examine four research question related to social innovation i) interests and motivations of social innovation ii) skills and resources of social innovation iii) inhibiting and enabling institutional factors of social innovation iv) outcomes of social innovation.
Z
Enterprise-Driven Open Source Software
data.niaid.nih.gov
opendatalab.com
+1more
Updated Apr 22, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louridas, Panos (2020). Enterprise-Driven Open Source Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653877
Explore at:
Dataset updated
Apr 22, 2020
Dataset provided by
Theodorou, Georgios
Louridas, Panos
Kotti, Zoe
Kravvaritis, Konstantinos
Spinellis, Diomidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

The main dataset is provided as a 17,264 record tab-separated file named enterprise_projects.txt with the following 29 fields.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

sdtc: true if selected using the same domain top committers heuristic (9,016 records)

mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,314 records)

mcve: true if selected using the multiple committers from a probable company heuristic (8,015 records),

star_number: number of GitHub watchers

commit_count: number of commits

files: number of files in current main branch

lines: corresponding number of lines in text files

pull_requests: number of pull requests

github_repo_creation: timestamp of the GitHub repository creation

earliest_commit: timestamp of the earliest commit

most_recent_commit: date of the most recent commit

committer_count: number of different committers

author_count: number of different authors

dominant_domain: the projects dominant email domain

dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain

dominant_domain_author_commits: corresponding number for commit authors

dominant_domain_committers: number of committers whose email matches the project's dominant domain

dominant_domain_authors: corresponding number for commit authors

cik: SEC's EDGAR "central index key"

fg500: true if this is a Fortune Global 500 company (2,233 records)

sec10k: true if the company files SEC 10-K forms (4,180 records)

sec20f: true if the company files SEC 20-F forms (429 records)

project_name: GitHub project name

owner_login: GitHub project's owner login

company_name: company name as derived from the SEC and Fortune 500 data

owner_company: GitHub project's owner company name

license: SPDX license identifier

The file cohost_project_details.txt provides the full set of 311,223 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

stars: number of GitHub watchers

commit_count: number of commits
w
Project Jigifa Endline Survey 2016 - Mali
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natalie Roschnik (2023). Project Jigifa Endline Survey 2016 - Mali [Dataset]. https://microdata.worldbank.org/index.php/catalog/5977
Explore at:
Dataset updated
Aug 21, 2023
Dataset provided by
Natalie Roschnik
Sian Clarke
Time period covered
2016
Area covered
Mali
Description
Abstract

The objective of the endline surveys in 2016 were to gather household, biomedical, and cognition data in order to evaluate the long-term impact of home supplementation with micronutrient powders (MNP), when combined with seasonal malaria chemoprevention (SMC) and early stimulation, delivered through community preschools and parenting sessions, on the health and cognitive development of children during the first five years of life.

The trial consisted of 3 arms. First, 60 villages with established Early Childhood Development centres (ECD) were randomised to 1 of 2 arms:

1) Children living in villages in the ECD control arm received SMC as part of national health programming and a national parenting intervention delivered by ECD center staff trained and supported by Save the Children, with ALL resident children eligible to participate in the interventions irrespective of enrolment in ECD program (ECD Control group).

2) Children living in villages in the intervention arm also received the SMC and parenting interventions described above, but additionally were eligible to receive home supplementation with micronutrient powders (MNP intervention arm).

3) Second, a third non-randomised arm was recruited comprised of children living in 30 randomly selected villages where there were no ECD centers in place and thus both the parenting interventions and MNPs were absent. These children received SMC only, as part of national health programming (non-ECD comparison arm).

Trial arm and Interventions received:

T1. MNP intervention arm: 30 villages with ECD centre (randomised); MNP-Yes, Parenting-Yes, SMC-Yes C1. ECD control arm: 30 villages with ECD centre (randomised); MNP-No, Parenting-Yes, SMC-Yes C2. Non-ECD comparison arm: 30 villages without ECD centre (not randomised); MNP-No, Parenting-No, SMC-Yes

Three cross-sectional endline surveys took place during the period May-August 2016, three years after the original MNP intervention began, and consisted of the following questionnaires and assessments in two age groups of children, 3 year olds and 5 year olds:

i) A household questionnaire was used to collect data from the primary adult caregiver of the child on home environment, exposure to the interventions, and reported practice outcomes of relevance to the parenting intervention.

ii) Biomedical outcomes were measured in children through laboratory and clinical assessment.

iii) A battery of tests were used to assess cognitive performance and school readiness in childen, using a different age-specific test battery for each age group adapted for local language and culture.

Note: Household and cognitive performance data were gathered from participants in all three arms. Biomedical data were only collected from children in the two randomised arms, to evaluate impact of MNP supplementation on anaemia (primary biomedical outcome) in children who received MNPs and those who did not, using a robust study design.

Geographic coverage

Districts (cercles) of Sikasso and Yorosso, Region of Sikasso

Analysis unit

Individuals and communities

Universe

Random sample of target population for the intervention in the 90 communities that consented to participate in the trial, namely pre-school children 0-6 years.

Kind of data

Sample survey data [ssd]

Sampling procedure

The target population for the interventions comprised all children aged 3 months to 6 years, who were resident in the 90 study communities participating in the trial; the primary sampling unit is the individual child.

Sample Frame:

To identify the number of target beneficiaries, a complete census of all children of eligible age was carried out in the 90 study villages in August 2013. The census listing from 2013 thus defined the population of children who are eligible to have received the interventions every year for the three years between 2013-2016; and was used as the sampling frame of children in whom the impact after three years of implementation of the interventions was evaluated. The intention was to evaluate study outcomes in the same child one year after the start of the MNP intervention (May 2014) and again after three years of the intervention (2016).

A random sample of children was drawn from all children listed in the census for each community participating in the trial, according to the following age criteria:

Date of Birth, or Age in August 2013 (Age group in 2016 surveys) (i) Born between 1 Jan 2013 – 30 June 2013, or aged <1 year in 2013 census if DOB not known (3 years) (ii) Born between 1 May 2010 – 30 April 2011, or aged 2 years in census if DOB not known (5 years)

Thus, all children previously randomly selected and enrolled in the evaluation cohort in 2014 were, if still resident in the village and present on the day of the survey, re-surveyed in May 2016.

Sample Size:

Power analysis was undertaken for a comparison of two arms, taking account of clustering by community. Survey data on biomedical and cognitive outcomes collected in 2014 were used to inform sample size assumptions, including prevalence of primary outcomes, intraclass correlation (ICC) and number of children recruited per cluster. Prevalence of anaemia amongst 3-year old children in 2014 was found to be 61.6% and 64.0% in the intervention and control arms respectively (p=0.618) and 53.8% and 51.9% respectively amongst 5-year old children (p=0.582). The observed ICC for anaemia endpoint at baseline was 0.08 in 3-year old children and 0.06 in 5-year old children. Observed ICC for cognitive outcomes measured in 2014 was 0.09, ranging from 0.05 to 0.16 for individual tasks within the cognitive battery.

Sample Size Estimation for Health Outcomes:

Approximately 20-25 children per cluster were recruited into each age cohort in 2013. Power calculations for anaemia (primary endpoint) were undertaken for three alternative scenarios at endline: (i) to allow for the possibility of up to 20% loss to follow up between 2014 and 2016, power calculations were performed for a sample size at endline of 16 children per cluster; (ii) a smaller cluster size of 14 children sampled per village, under a scenario of 30% loss to follow-up; and (iii) unequal clusters, to allow for the possibility that variation in losses to follow-up between villages could result in an unequal number of children sampled in each village. In this case, cluster size is the mean number of children sampled per cluster.

Thus, assuming a conservative prevalence of anaemia of 50% in the control group and ICC of 0.08, a sample size of 30 communities per arm with 14-20 children sampled per community, will under all of these scenarios provide 80% power to detect a reduction in anemia of at least 28% at 5% level of significance.

Sample Size Estimation for Cognitive Outcomes:

Power calculations for cognitive outcomes explored: (i) a smaller cluster size of 14 children sampled per village, for example resulting from a higher than expected loss to follow-up of 30%; (ii) statistical analysis of differences between arms which does not adjust for baseline - a scenario which allows for the possibility to increase the sample size to compensate for losses to follow-up by increased recruitment of new children for whom no baseline data would be available; and (iii) effect of unequal clusters. Thus, for cognitive-linguistic skills, a sample size of 30 communities per arm with 14-20 children in each age cohort sampled per community will provide 80% power to detect an effect size between 0.27-0.29 at 5% level of significance, assuming an (ICC) of 0.10 and individual, household and community-level factors account for at least 25% of variation in cognitive foundation skills. Whilst for a similar sample size of 30 communities per arm with 14-20 children sampled per community and ICC of 0.10, a statistical analysis which does not adjust for baseline will provide 80% power to detect an effect size between 0.28-0.30 at 5% level of significance.

The sample at endline in May 2016 thus comprised a total of up to 600 children aged 3y and 600 children aged 5y at endline in each arm: T1 Intervention group (with ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y). C1 ECD control group (with ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y). C2 Comparison group (without ECD): 30 communities, with approx. 40 randomly selected children in each community (20 aged 3y; 20 aged 5y).

Strategy for Absent Respondents/Not Found/Refusals:

Every effort was made to trace children previously recruited into the evaluation cohort. Since some losses-to-follow-up (for example to due to child deaths, outward migration) were expected between 2014 and 2016, the primary strategy was to oversample in 2014. However, for villages where loss-to-follow-up was higher than expected and it was not possible to trace sufficient number of children remaining from the original sample to meet the required sample size per cluster, additional children were recruited into the evaluation survey in 2016. New recruits were selected at random from the children listed as resident in the village at the time of the original census in 2013. All new recruits had thus been resident in the village and exposed to the interventions throughout the three preceding years.

Mode of data collection

Face-to-face [f2f]

Research instrument

Household questionnaire (Form_Parent_MaliSIEF_2016_french.pdf ; Form_Parent_MaliSIEF_2016_english.pdf)

The questionnaires for the parent interview were structured questionnaires. A questionnaire was administered to the child’s primary caregiver
4
Data and code underlying the research project: Evaluation of Perceptual...
data.4tu.nl
zip
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bendik Christensen (2024). Data and code underlying the research project: Evaluation of Perceptual Accuracy in Simulated Room Impulse Responses [Dataset]. http://doi.org/10.4121/9208260b-8625-4917-8ad4-f56190187070.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/9208260b-8625-4917-8ad4-f56190187070.v1
Dataset updated
Jun 28, 2024
Dataset provided by
4TU.ResearchData
Authors
Bendik Christensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This project involves conducting and analyzing listening tests using modified webMUSHRA software to evaluate the perceptual accuracy of simulated acoustic environments. The code is structured into five main directories: Docs, containing ethics documents; Modified webMUSHRA Software, including testing code and configurations run with Docker for paired_comparison and subjective_eval tests; Results, storing both raw and processed data from the listening tests; Samples, providing original and convolved audio files with real and simulated Room Impulse Responses (RIRs); and Utils, featuring scripts for generating sine sweeps, convolving and clipping audio, and performing basic statistical analysis. For questions, contact b.christensen@student.tudelft.nl.

Facebook

Twitter

Click to copy link

Link copied

Cite

(Point of Contact) (2025). GLobal Ocean Data Analysis Project (GLODAP) version 1.1 (NCEI Accession 0001644) [Dataset]. https://catalog.data.gov/dataset/global-ocean-data-analysis-project-glodap-version-1-1-ncei-accession-00016441

GLobal Ocean Data Analysis Project (GLODAP) version 1.1 (NCEI Accession 0001644)

Explore at:

Dataset updated

Jul 1, 2025

Dataset provided by

(Point of Contact)

Description

The GLobal Ocean Data Analysis Project (GLODAP) is a cooperative effort to coordinate global synthesis projects funded through NOAA/DOE and NSF as part of the Joint Global Ocean Flux Study - Synthesis and Modeling Project (JGOFS-SMP). Cruises conducted as part of the World Ocean Circulation Experiment (WOCE), Joint Global Ocean Flux Study (JGOFS) and NOAA Ocean-Atmosphere Exchange Study (OACES) over the decade of the 1990s have created an oceanographic database of unparalleled quality and quantity. These data provide an important asset to the scientific community investigating carbon cycling in the oceans.

Clear search

Close search

Google apps

Main menu

GLobal Ocean Data Analysis Project (GLODAP) version 1.1 (NCEI Accession...

Data_Sheet_1_Advanced large language models and visualization tools for data...

Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Gate, OK...

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

Plans to implement big data and analytics projects in 2015

SHARP - Shape Analysis Research Project

Lending Club Loan Data Analysis - Deep Learning

An experiment on the reliability analysis of megaproject sustainability

‘Maddison Project Dataset 2020 Population by Region’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

App Review Text Analysis Project

EMEC Wildlife Data Analysis Project - Cleansed Data

Milky Way Project First Data Release IR Bubble Catalog

Data from: Coalbed Methane Geostatistical Analysis Project: Oak Grove,...

‘PoroTomo Project: Brady's Geothermal Field, Subtask 3.5: GPS Data Analysis’...

Smart City Challenge Finalists Project Proposals - Calibration Data

Global Ocean Data Analysis Project for Carbon (GLODAP)

Data from: Multinational Companies: Qualitative Comparative Analysis for...

Enterprise-Driven Open Source Software

Project Jigifa Endline Survey 2016 - Mali

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Data and code underlying the research project: Evaluation of Perceptual...

GLobal Ocean Data Analysis Project (GLODAP) version 1.1 (NCEI Accession 0001644)See More Versions

GLobal Ocean Data Analysis Project (GLODAP) version 1.1 (NCEI Accession 0001644)