7 datasets found
  1. Water isotopes of sea water analyzed since 1998 at LOCEAN

    • seanoe.org
    • pigma.org
    csv
    Updated Oct 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    waterisotopes-CISE-LOCEAN (2024). Water isotopes of sea water analyzed since 1998 at LOCEAN [Dataset]. http://doi.org/10.17882/71186
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 2024
    Dataset provided by
    SEANOE
    Authors
    waterisotopes-CISE-LOCEAN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 1997 - Dec 31, 2023
    Area covered
    Description

    locean has been in charge of collecting sea water for the analysis of water isotopes on a series of cruises or ships of opportunity mostly in the equatorial atlantic, in the north atlantic, in the southern indian ocean, in the southern seas, nordic seas, and in the arctic. the locean data set of the oxygen and hydrogen isotope (δ18o and δd)of marine water covers the period 1998 to 2019, but the effort is ongoing. most data prior to 2010 (only δ18o) were analyzed using isotope ratio mass spectrometry (isoprime irms) coupled with a multiprep system (dual inlet method), whereas most data since 2010 (and a few earlier data) were obtained by cavity ring down spectrometry (crds) on a picarro crds l2130-i, or less commonly on a picarro crds l2120-i. occasionally, some data were also run by marion benetti on an isoprime irms coupled to a gasbench (dual inlet method) at the university of iceland (reykjavik). on the locean picarro crds, most samples were initially analyzed after distillation, but since 2016, they have often been analyzed using a wire mesh to limit the spreading of sea salt in the vaporizer. some of the samples on the crds were analyzed more than once on different days, when repeatability for the same sample was not sufficient or the daily run presented a too large drift.accuracy is best when samples are distilled, and for δd are better on the picarro crds l2130-i than on the picarro crds l2120-i. usually, we found that the reproducibility of the δ18o measurements is within ± 0.05 ‰ and of the δd measurements within ± 0.30 ‰, which should be considered an upper estimate of the error on the measurement on a picarro crds.the water samples were kept in darkened glass bottles (20 to 50 ml) with special caps, and were often (but not always) taped afterwards. once brought back in paris, the samples were often stored in a cold room (with temperature close to 4°c), in particular if they were not analyzed within the next three months. there is however the possibility that some samples have breathed during storage. we found it happening on a number of samples, more commonly when they were stored for more than 5 years before being analyzed. we also used during one cruise bottles with not well-sealed caps (m/v nuka arctica in april 2019), which were analyzed within 3 months, but for which close to one third of the samples had breathed. we have retained those analyses, but added a flag ‘3’ meaning probably bad, at least on d-excess (outside of regions where sea ice forms or melts, for the analyses done on the picarro crds, excessive evaporation is usually found with a d-excess criterium (which tends to be too low); for the irms analyses, it is mostly based when excessive scatter is found in the s- δ18o scatter plots or between successive data, in which case some outliers were flagged at ‘3’). in some cases when breathing happened, we found that d-excess can be used to produce a corrected estimate of δ18o and δd (benetti et al., 2016). when this method was used a flag ‘1’ is added, indicating ‘probably good’ data, and should be thought as not as accurate as the data with no ‘correction’, which are flagged ‘2’ or ‘0’.we have adjusted data to be on an absolute fresh-water scale based on the study of benetti et al. (2017), and on further tests with the different wire meshes used more recently. we have also checked the consistency of the runs in time, as there could have been changes in the internal standards used. on the isoprime irms, it was mostly done using different batches of ‘eau de paris’ (edp), whereas on the picarro crds, we used three internal standards kept in metal tanks with a slight overpressure of dry air). the internal standards have been calibrated using vsmow and gisp, and were also sent to other laboratories to evaluate whether they had drifted since the date of creation (as individual sub-standards have typically stored for more than 5-years). these comparisons are still not fully statisfactory to evaluate possible drifts in the sub-standards.individual files correspond to regional subsets of the whole dataset. the file names are based on two letters for the region (see below) followed by –wisotopes and a version number (-v0, …): example so-wisotopes-v0; the highest version number corresponds to the latest update of the regional data set. the region two letters are the followings: so: southern ocean including cruise station and surface data mostly from 2017 in the weddell sea (wapiti cruise jr160004, doi:10.17882/54012), as well as in the southern ocean south of 20°s si: oiso cruise station and surface data in the southern indian ocean (since 1998) (doi:10.18142/228) ea: 20°n-20°s cruise station and surface data (since 2005), in particular in the equatorial atlantic from french pirata (doi:10.18142/14) and egee cruises (doi:10.18142/95) na: 20°n-72°n station and surface data, mostly in the north[...]

  2. Scatter plot of the samples in the prostate cancer dataset contributed by...

    • plos.figshare.com
    tiff
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Regina Berretta; Pablo Moscato (2023). Scatter plot of the samples in the prostate cancer dataset contributed by True et al. [Dataset]. http://doi.org/10.1371/journal.pone.0012262.g013
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Regina Berretta; Pablo Moscato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have used the same color coding convention we have used in Figure 12. We plot the values of two modified statistical complexities, which we will call M-Gleason 3 and M-Gleason 5. Instead of using the equiprobable distribution as our probability distribution of reference (for the computation of the Jensen-Shannon Divergence of the gene expression profile to this distribution), as required for the MPR-Statistical Complexity calculation, we used a different one. For the M-Gleason 3, the probability distribution of the reference is obtained averaging all the probability distributions of the samples that have been labelled as Gleason 3 (analogously, we calculated M-Gleason 5). This is analogous to our approach in melanoma (Figure 5) in which we used normal and metastatic samples as reference sets for a modified statistical complexity. We observe that, even in this case, 02_003E and 03_063 continue to appear as outliers. In addition to the evidence, we have observed that the deletion of these two samples did not significantly alter the identification of biomarkers.

  3. Energy Consumption of United States Over Time

    • kaggle.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Energy Consumption of United States Over Time [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-the-energy-consumption-of-united-state
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Energy Consumption of United States Over Time

    Building Energy Data Book

    By Department of Energy [source]

    About this dataset

    The Building Energy Data Book (2011) is an invaluable resource for gaining insight into the current state of energy consumption in the buildings sector. This dataset provides comprehensive data on residential, commercial and industrial building energy consumption, construction techniques, building technologies and characteristics. With this resource, you can get an in-depth understanding of how energy is used in various types of buildings - from single family homes to large office complexes - as well as its impact on the environment. The BTO within the U.S Department of Energy's Office of Energy Efficiency and Renewable Energy developed this dataset to provide a wealth of knowledge for researchers, policy makers, engineers and even everyday observers who are interested in learning more about our built environment and its energy usage patterns

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides comprehensive information regarding energy consumption in the buildings sector of the United States. It contains a number of key variables which can be used to analyze and explore the relations between energy consumption and building characteristics, technologies, and construction. The data is provided in both CSV format as well as tabular format which can make it helpful for those who prefer to use programs like Excel or other statistical modeling software.

    In order to get started with this dataset we've developed a guide outlining how to effectively use it for your research or project needs.

    • Understand what's included: Before you start analyzing the data, you should read through the provided documentation so that you fully understand what is included in the datasets. You'll want to be aware of any potential limitations or requirements associated with each type of data point so that your results are valid and reliable when drawing conclusions from them.

    • Clean up any outliers: You may need to take some time upfront investigating suspicious outliers within your dataset before using it in any further analyses — otherwise, they can skew results down the road if not dealt with first-hand! Furthermore, they could also make complex statistical modeling more difficult as well since they artificially inflate values depending on their magnitude within each example data point (i.e., one outlier could affect an entire model’s prior distributions). Missing values should also be accounted for too since these may not always appear obvious at first glance when reviewing a table or graphical representation - but accurate statistics must still be obtained either way no matter how messy things seem!

    • Exploratory data analysis: After cleaning up your dataset you'll want to do some basic exploring by visualizing different types of summaries like boxplots, histograms and scatter plots etc.. This will give you an initial case into what trends might exist within certain demographic/geographic/etc.. regions & variables which can then help inform future predictive models when needed! Additionally this step will highlight any clear discontinuous changes over time due over-generalization (if applicable), making sure predictors themselves don’t become part noise instead contributing meaningful signals towards overall effect predictions accuracy etc…

    • Analyze key metrics & observations: Once exploratory analyses have been carried out on rawsamples post-processing steps are next such as analyzing metrics such ascorrelations amongst explanatory functions; performing significance testing regression models; imputing missing/outlier values and much more depending upon specific project needs at hand… Additionally – interpretation efforts based

    Research Ideas

    • Creating an energy efficiency rating system for buildings - Using the dataset, an organization can develop a metric to rate the energy efficiency of commercial and residential buildings in a standardized way.
    • Developing targeted campaigns to raise awareness about energy conservation - Analyzing data from this dataset can help organizations identify areas of high energy consumption and create targeted campaigns and incentives to encourage people to conserve energy in those areas.
    • Estimating costs associated with upgrading building technologies - By evaluating various trends in building technologies and their associated costs, decision-makers can determine the most cost-effective option when it comes time to upgrade their structures' energy efficiency...
  4. Homestays data

    • kaggle.com
    Updated May 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanshu shukla (2024). Homestays data [Dataset]. https://www.kaggle.com/datasets/priyanshu594/homestays-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Priyanshu shukla
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Objective: Build a robust predictive model to estimate the log_price of homestay listings based on comprehensive analysis of their characteristics, amenities, and host information. First make sure that the entire dataset is clean and ready to be used. 1. Feature Engineering: Task: Enhance the dataset by creating actionable and insightful features. Calculate Host_Tenure by determining the number of years from host_since to the current date, providing a measure of host experience. Generate Amenities_Count by counting the items listed in the amenities array to quantify property offerings. Determine Days_Since_Last_Review by calculating the days between last_review and today to assess listing activity and relevance. 2. Exploratory Data Analysis (EDA): Task: Conduct a deep dive into the dataset to uncover underlying patterns and relationships. Analyze how pricing (log_price) correlates with both categorical (such as room_type and property_type) and numerical features (like accommodates and number_of_reviews). Utilize statistical tools and visualizations such as correlation matrices, histograms for distribution analysis, and scatter plots to explore relationships between variables. 3. Geospatial Analysis: Task: Investigate the geographical data to understand regional pricing trends. Plot listings on a map using latitude and longitude data to visually assess price distribution. Examine if certain neighbourhoods or proximity to city centres influence pricing, providing a spatial perspective to the pricing strategy. 4. Sentiment Analysis on Textual Data: Task: Apply advanced natural language processing techniques to the description texts to extract sentiment scores. Use sentiment analysis tools to determine whether positive or negative descriptions influence listing prices, incorporating these findings into the predictive model being trained as a feature. 5. Amenities Analysis: Task: Thoroughly parse and analyse the amenities provided in the listings. Identify which amenities are most associated with higher or lower prices by applying statistical tests to determine correlations, thereby informing both pricing strategy and model inputs. 6. Categorical Data Encoding: Task: Convert categorical data into a format suitable for machine learning analysis. Apply one-hot encoding to variables like room_type, city, and property_type, ensuring that the model can interpret these as distinct features without any ordinal implication. 7. Model Development and Training: Task: Design and train predictive models to estimate log_price. Begin with a simple linear regression to establish a baseline, then explore more complex models such as RandomForest and GradientBoosting to better capture non-linear relationships and interactions between features. Document (briefly within Jupyter notebook itself) the model-building process, specifying the choice of algorithms and rationale. 8. Model Optimization and Validation: Task: Systematically optimize the models to achieve the best performance. Employ techniques like grid search to experiment with different hyperparameters settings. Validate model choices through techniques like k-fold cross-validation, ensuring the model generalizes well to unseen data. 9. Feature Importance and Model Insights: Task: Analyze the trained models to identify which features most significantly impact log_price. Utilize model-specific methods like feature importance scores for tree-based models and SHAP values for an in depth understanding of feature contributions. 10. Predictive Performance Assessment: Task: Critically evaluate the performance of the final model on a reserved test set. Use metrics such as Root Mean Squared Error (RMSE) and R-squared to assess accuracy and goodness of fit. Provide a detailed analysis of the residuals to check for any patterns that might suggest model biases or misfit.

  5. f

    Data_Sheet_7_“R” U ready?: a case study using R to analyze changes in gene...

    • frontiersin.figshare.com
    docx
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_7_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s007
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Frontiers
    Authors
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

  6. d

    Data from: LBA-ECO CD-34 Hyperion 30-m Surface Reflectance, Amazon Basin:...

    • datadiscoverystudio.org
    pl
    Updated Jul 11, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). LBA-ECO CD-34 Hyperion 30-m Surface Reflectance, Amazon Basin: 2002-2005. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/63c9bec65a2845fcadcd47e779bf683e/html
    Explore at:
    plAvailable download formats
    Dataset updated
    Jul 11, 2018
    Description

    description: ABSTRACT: This data set contains 20 multispectral surface reflectance images collected by the EO-1 satellite Hyperion sensor at 30-m resolution and covering the entire Amazon Basin for 2002 - 2005. All images were converted to GeoTiff format for distribution. The respective ENVI *.hdr files are included as companion files and contain image projection and band information.The selected multispectral images were processed using ENVI software as described in Chambers et al. (2009). Bands with uncalibrated wavelengths and those with low spectral response were removed leaving a spectral subset of generally 196 bands (some images have fewer). A cloud mask was developed using 2-d scatter plots of variable reflectance bands to highlight clouds as regions of interest (ROIs), allowing clouds and cloud edges to be masked. A de-streaking algorithm was then applied to the image to reduce variance in balance between the vertical columns. Apparent surface reflectance was calculated for this balanced image using the atmospheric correction algorithm ACORN in 1.5pb mode (AIG-LLC, Boulder, CO). The images (18 of the 20) were georeferenced using the corresponding Advanced Land Imager (ALI) satellite images.; abstract: ABSTRACT: This data set contains 20 multispectral surface reflectance images collected by the EO-1 satellite Hyperion sensor at 30-m resolution and covering the entire Amazon Basin for 2002 - 2005. All images were converted to GeoTiff format for distribution. The respective ENVI *.hdr files are included as companion files and contain image projection and band information.The selected multispectral images were processed using ENVI software as described in Chambers et al. (2009). Bands with uncalibrated wavelengths and those with low spectral response were removed leaving a spectral subset of generally 196 bands (some images have fewer). A cloud mask was developed using 2-d scatter plots of variable reflectance bands to highlight clouds as regions of interest (ROIs), allowing clouds and cloud edges to be masked. A de-streaking algorithm was then applied to the image to reduce variance in balance between the vertical columns. Apparent surface reflectance was calculated for this balanced image using the atmospheric correction algorithm ACORN in 1.5pb mode (AIG-LLC, Boulder, CO). The images (18 of the 20) were georeferenced using the corresponding Advanced Land Imager (ALI) satellite images.

  7. Data from: Online Shopping Consumer Behavior Dataset

    • kaggle.com
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Online Shopping Consumer Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/online-shopping-consumer-behavior-dataset/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Online Shopping Consumer Behavior Dataset

    Consumer Buying Patterns in E-Commerce

    By Weitong Li [source]

    About this dataset

    This dataset is a rich compilation of data that thoroughly guides us through consumers' behavior and their buying intentions while engaged in online shopping. It has been constructed with immense care to ensure it effectively examines an array of factors that influence customers' purchasing intentions in the increasingly significant realm of digital commerce.

    The dataset is exhaustively composed with careful attention to collecting a diverse set of information, thus allowing a broad view into what affects online shopping behavior. Specific columns included cover customer's existing awareness about the website or source from where they are shopping, their information regarding the products they wish to purchase, and more importantly, their satisfaction level related to previous purchases.

    Additionally, the dataset delves deep into investigating both objective and subjective aspects impacting customer behavior online. As such, it includes data on various webpage factors like loading speed, user-friendly interface design, webpage aesthetics, etc., which could significantly persuade the consumer's decision-making process during online shopping. The completion and submission convenience provided by those websites also form part of this database.

    In order to fully understand consumer behavior within an online environment from multiple facets', individual consumers' subjective views are also captured in this dataset; it explores how consumers perceive their trust towards an e-commerce site or if they believe it’s convenient for them to shop via these platforms versus traditional methods? Do they feel relaxed when doing so?

    In recognizing how crucial products competitiveness within such landscapes influences buyer intention - columns that provide details on critical characteristics like price comparisons against offline stores or similar product competitors across different websites have been included too.

    Overall this comprehensive aggregated data collection aims not only at understanding fundamental consumer preferences but also towards predicting future buying behaviors hence forth enabling businesses capitalize on emerging trends within online retail spaces more efficiently & profitably

    How to use the dataset

    In an online-focused world, understanding consumer behavioral data is crucial. The 'Online Shopping Purchasing Intention Dataset' provides a comprehensive collection of consumer-based insights based on their behavior in virtual shopping environments. This dataset explores various factors that might affect a customer's decision to purchase. Here's how you can harness this dataset:

    Defining the Problem

    Identify a problem or question this data may answer. This might be: understanding what factors influence buying decisions, predicting whether a visit will result in a purchase based on user behavior, analyzing the impact of the month, operating system or traffic type on online purchasing intention etc.

    Data Exploration

    Understand the structure of the dataset by getting to know each variable and its meaning: - Administrative: Counting different types of pages visited by the user in that session. - Informational & Product Related: Measures how many informational/product related pages are viewed. - Bounce Rates, Exit Rate, Page Values: Assess these metrics as they provide significant insight about visitor activity. - Special Day: Explore correlation between proximity to special days (like Mother’s day and Valentine’s Day) with transactions. - Operating Systems / Browser / Region / Traffic Type: Uncover behavioral patterns associated with technical specs/geo location/ source of traffic.

    Analysis and Visualization

    Use appropriate statistical analysis techniques to scrutinize relationships between variables such as correlation analysis or chi-square tests for independence etc.

    Visualize your findings using plots like bar graphs for categorical features comparison or scatter plots for multivariate relationships etc.

    Model Building

    Use machine learning algorithms (like logistic regression or decision tree models) potentially useful if your goal is predicting purchase intention based on given features.

    This could also involve feature selection - choosing most relevant predictors; training & testing model and finally assessing model performance through metrics like accuracy score, precision-recall scores etc.

    Remember to appropriately handle missing values if any before diving into predictive modeling

    The comprehens...

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
waterisotopes-CISE-LOCEAN (2024). Water isotopes of sea water analyzed since 1998 at LOCEAN [Dataset]. http://doi.org/10.17882/71186
Organization logo

Water isotopes of sea water analyzed since 1998 at LOCEAN

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Oct 2024
Dataset provided by
SEANOE
Authors
waterisotopes-CISE-LOCEAN
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Dec 31, 1997 - Dec 31, 2023
Area covered
Description

locean has been in charge of collecting sea water for the analysis of water isotopes on a series of cruises or ships of opportunity mostly in the equatorial atlantic, in the north atlantic, in the southern indian ocean, in the southern seas, nordic seas, and in the arctic. the locean data set of the oxygen and hydrogen isotope (δ18o and δd)of marine water covers the period 1998 to 2019, but the effort is ongoing. most data prior to 2010 (only δ18o) were analyzed using isotope ratio mass spectrometry (isoprime irms) coupled with a multiprep system (dual inlet method), whereas most data since 2010 (and a few earlier data) were obtained by cavity ring down spectrometry (crds) on a picarro crds l2130-i, or less commonly on a picarro crds l2120-i. occasionally, some data were also run by marion benetti on an isoprime irms coupled to a gasbench (dual inlet method) at the university of iceland (reykjavik). on the locean picarro crds, most samples were initially analyzed after distillation, but since 2016, they have often been analyzed using a wire mesh to limit the spreading of sea salt in the vaporizer. some of the samples on the crds were analyzed more than once on different days, when repeatability for the same sample was not sufficient or the daily run presented a too large drift.accuracy is best when samples are distilled, and for δd are better on the picarro crds l2130-i than on the picarro crds l2120-i. usually, we found that the reproducibility of the δ18o measurements is within ± 0.05 ‰ and of the δd measurements within ± 0.30 ‰, which should be considered an upper estimate of the error on the measurement on a picarro crds.the water samples were kept in darkened glass bottles (20 to 50 ml) with special caps, and were often (but not always) taped afterwards. once brought back in paris, the samples were often stored in a cold room (with temperature close to 4°c), in particular if they were not analyzed within the next three months. there is however the possibility that some samples have breathed during storage. we found it happening on a number of samples, more commonly when they were stored for more than 5 years before being analyzed. we also used during one cruise bottles with not well-sealed caps (m/v nuka arctica in april 2019), which were analyzed within 3 months, but for which close to one third of the samples had breathed. we have retained those analyses, but added a flag ‘3’ meaning probably bad, at least on d-excess (outside of regions where sea ice forms or melts, for the analyses done on the picarro crds, excessive evaporation is usually found with a d-excess criterium (which tends to be too low); for the irms analyses, it is mostly based when excessive scatter is found in the s- δ18o scatter plots or between successive data, in which case some outliers were flagged at ‘3’). in some cases when breathing happened, we found that d-excess can be used to produce a corrected estimate of δ18o and δd (benetti et al., 2016). when this method was used a flag ‘1’ is added, indicating ‘probably good’ data, and should be thought as not as accurate as the data with no ‘correction’, which are flagged ‘2’ or ‘0’.we have adjusted data to be on an absolute fresh-water scale based on the study of benetti et al. (2017), and on further tests with the different wire meshes used more recently. we have also checked the consistency of the runs in time, as there could have been changes in the internal standards used. on the isoprime irms, it was mostly done using different batches of ‘eau de paris’ (edp), whereas on the picarro crds, we used three internal standards kept in metal tanks with a slight overpressure of dry air). the internal standards have been calibrated using vsmow and gisp, and were also sent to other laboratories to evaluate whether they had drifted since the date of creation (as individual sub-standards have typically stored for more than 5-years). these comparisons are still not fully statisfactory to evaluate possible drifts in the sub-standards.individual files correspond to regional subsets of the whole dataset. the file names are based on two letters for the region (see below) followed by –wisotopes and a version number (-v0, …): example so-wisotopes-v0; the highest version number corresponds to the latest update of the regional data set. the region two letters are the followings: so: southern ocean including cruise station and surface data mostly from 2017 in the weddell sea (wapiti cruise jr160004, doi:10.17882/54012), as well as in the southern ocean south of 20°s si: oiso cruise station and surface data in the southern indian ocean (since 1998) (doi:10.18142/228) ea: 20°n-20°s cruise station and surface data (since 2005), in particular in the equatorial atlantic from french pirata (doi:10.18142/14) and egee cruises (doi:10.18142/95) na: 20°n-72°n station and surface data, mostly in the north[...]

Search
Clear search
Close search
Google apps
Main menu