18 datasets found
  1. u

    NOAA R/V Ron Brown Fourier Transform Infrared Spectroscopy (FTIR) Data

    • data.ucar.edu
    ascii
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lynn Russell (2025). NOAA R/V Ron Brown Fourier Transform Infrared Spectroscopy (FTIR) Data [Dataset]. http://doi.org/10.26023/87N8-35T6-RE0C
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Oct 7, 2025
    Dataset provided by
    NSF NCAR Earth Observing Laboratory
    Authors
    Lynn Russell
    Time period covered
    Oct 21, 2008 - Nov 29, 2008
    Area covered
    Description

    This file contains the Fourier Transform Infrared Spectroscopy (FTIR) Spectroscopy Data from NOAA R/V Ronald H. Brown ship during VOCALS-REx 2008.

  2. f

    Coordinate transform R script from Variation in farming damselfish behaviour...

    • datasetcatalog.nlm.nih.gov
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keith, Sally A.; Williams, Gareth J.; Boström-Einarsson, Lisa; Exton, Dan A.; Sheppard, Catherine E. (2024). Coordinate transform R script from Variation in farming damselfish behaviour creates a competitive landscape of risk on coral reefs [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001493664
    Explore at:
    Dataset updated
    May 15, 2024
    Authors
    Keith, Sally A.; Williams, Gareth J.; Boström-Einarsson, Lisa; Exton, Dan A.; Sheppard, Catherine E.
    Description

    R Code for transformation SVS coordinates to the orthomosaic coordinate system

  3. d

    Data From: Influence of Hydrological Perturbations and Riverbed Sediment...

    • dataone.org
    Updated Nov 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michelle Newcomer; Susan Hubbard (2021). Data From: Influence of Hydrological Perturbations and Riverbed Sediment Characteristics on Hyporheic Zone Respiration of CO2 and N-2, Journal of Geophysical Research-Biogeosciences [Dataset]. http://doi.org/10.21952/WTR/1508398
    Explore at:
    Dataset updated
    Nov 3, 2021
    Dataset provided by
    ESS-DIVE
    Authors
    Michelle Newcomer; Susan Hubbard
    Time period covered
    May 1, 2012 - Aug 1, 2016
    Area covered
    Description

    This data package contains pumping data (.txt), parameter matrices, and R code (.R, .RData) to perform bootstrapping for parameter selection for the bioclogging model development. The pumping data were collected from the Russian River Riverbank Filtration site located in Sonoma County, California from 2010-2017 from three riverbank collection wells located alongside the study site. The pumping data is directly correlated with water table oscillations, so the code performs these correlations and simulates stochastic versions of water table oscillations. See Metadata Description.pdf for full details on dataset production. This dataset must be used with the R programming language. This dataset and R code is associated with the publication "Influence of Hydrological Perturbations and Riverbed Sediment Characteristics on Hyporheic Zone Respiration of CO2 and N-2" This research was supported by the Jane Lewis Fellowship from the University of California, Berkeley, the Sonoma County Water Agency (SCWA), the Roy G. Post Foundation Scholarship, the U.S. Department of Energy, Office of Science Graduate Student Research (SCGSR) Program, U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under award DE-AC02-05CH11231, and the UFZ-Helmholtz Centre for Environmental Research, Leipzig, Germany.

  4. Z

    Data and R-script for a tutorial that explains how to convert spreadsheet...

    • data-staging.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Goedhart, Joachim (2024). Data and R-script for a tutorial that explains how to convert spreadsheet data to tidy data. [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4056965
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    SILS - UvA
    Authors
    Goedhart, Joachim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and R-script for a tutorial that explains how to convert spreadsheet data to tidy data. The tutorial is published in a blog for The Node (https://thenode.biologists.com/converting-excellent-spreadsheets-tidy-data/education/)

  5. m

    Experiment files and measurement parameters for Bruker Invenio-R

    • data.mendeley.com
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohd Rashidi Abdull Manap (2024). Experiment files and measurement parameters for Bruker Invenio-R [Dataset]. http://doi.org/10.17632/rp8nthpx4f.1
    Explore at:
    Dataset updated
    Feb 6, 2024
    Authors
    Mohd Rashidi Abdull Manap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the experiment file (.xpm), the settings and values of the advanced parameters (e.g. resolution, sample scan time, background scan time, spectral range to be used.) are stored. Meanwhile, the phase resolution is stored in the FT. The optic parameters are shown in this experimental condition as well.

  6. R

    ACSM2B-like proteins transform ST to ST-CoA

    • reactome.org
    biopax2, biopax3 +5
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bijay Jassal, ACSM2B-like proteins transform ST to ST-CoA [Dataset]. https://reactome.org/content/detail/R-HSA-159567
    Explore at:
    sbgn, biopax2, owl, sbml, biopax3, docx, pdfAvailable download formats
    Dataset provided by
    Ontario Institute for Cancer Research
    Authors
    Bijay Jassal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Salicylate (ST) and ATP react with coenzyme A to form salicylate-CoA (ST-CoA), AMP, and pyrophosphate in a reaction catalyzed by xenobiotic/medium-chain fatty acid:CoA ligase (Vessey et al. 2003).

  7. m

    Infrared spectra of particles were obtained from marine sediment samples in...

    • data.mendeley.com
    Updated Jan 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohd Rashidi Abdull Manap (2024). Infrared spectra of particles were obtained from marine sediment samples in July 2023 using the Alpha, Lumos, and Invenio-R spectrometers [Dataset]. http://doi.org/10.17632/hvj6y7bjv3.1
    Explore at:
    Dataset updated
    Jan 12, 2024
    Authors
    Mohd Rashidi Abdull Manap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    206 Infrared spectra of particles were obtained from marine sediment samples in July 2023 using the Alpha, Lumos, and Invenio-R spectrometers

  8. FIFa21 Messy Dataset cleaned and transformed

    • kaggle.com
    zip
    Updated Feb 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Mora Hansen (2024). FIFa21 Messy Dataset cleaned and transformed [Dataset]. https://www.kaggle.com/datasets/nicolasmorahansen/fifa21-messy-dataset-cleaned-and-transformed
    Explore at:
    zip(5473572 bytes)Available download formats
    Dataset updated
    Feb 26, 2024
    Authors
    Nicolas Mora Hansen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    FIFA21 - Data Transformation Cleaning and Transformation

    EA Sports FIFA 21 is a popular video game that simulates football matches. Often, data collected from this game might be messy, containing inconsistencies, missing values, and various formatting issues.

    For this project, I will attempt to clean, organize and prepare this messy FIFA_21 data for analysis using just Excel. Although, it can be done somewhat faster using Python, R, or other programming languages; the challenge at hand is to use Excel.

    Observations(Rows)=18980

    1. 'Spot blank values'.'COUNTBLANK'.

    Column 'Loan Date End' has '17966' blanks.

    2. 'Spot 'zero' values'. 'COUNTIF'.

    =COUNTIF(A1:A18980; "=0")

    'Value', 'Wage', 'Release Clause', 'Hits' have '0' values.

    3.'Column Headers'

    =SUBSTITUTE(A1; " "; "_")

    Unique_Atributes(columns)=76

    1.'Height'

    At first glance this height column looked like it needed a simple formula to turn a string ending in 'cm' to real numbers expressing a height in centimeteres, but then it was visible that some values were also in feet. And they were expressed with apostrophes and air quotes which called for a more intricate formula to fetch every value and transform it. Inches had to be turned to feet. Then the total value turned into centimeteres. The 'IF' formula verifies if the string is a number by leaving out the 'cm' 'feet(')' and 'inches(")' from the string. If it is centimeteres, the number is kept. If it is feet, the digits before the airquotes are kept, the digits after the airquotes (the inches) are turned into feet, then added together, and finally turned into centimeters.

    =IF(ISNUMBER(FIND("cm";$O2)); VALUE(SUBSTITUTE($O2; "cm"; "")); ROUND((LEFT($O2; FIND("'"; $O2) - 1) * 12 + MID($O2; FIND("'"; $O2) + 1; FIND(""""; $O2) - FIND("'"; $O2) - 1)) * 2,54;0))

    2.'Weight'

    Weight was added in 'Kg' and 'Lbs'. For 'Kg' the value is turned into numbers. For 'Lbs' the value is converted into 'Kg' and then turned into numbers. The result is rounded up to null decimal points.

    =ROUND(IF(ISNUMBER(FIND("kg";$P2));VALUE(SUBSTITUTE($P2;"kg";""))*1;IF(ISNUMBER(FIND("lbs";$P2));VALUE(SUBSTITUTE($P2;"lbs";""))/2,205;0));0)

    3.'Joined'

    A new column is added to the right of 'Joined' by the name 'WithClub10Years'. This column shows whether the player has been at the same club for a minimum of 10 years.

    =IF(YEAR(NOW())-YEAR(T2)>=10; "10 Years"; "")

    4.'Value', 'Wage', 'Release Clause'

    The monetary figures were converted into numerical values only. The values are Euros. The 'M' and 'K' removed and its according figure multiplied to show millions and thousands respectively. Decimal points delimiter changed from '.' to ',' for calculation.

    =IF(ISNUMBER(FIND("M"; Z2)); VALUE(SUBSTITUTE(Z2; "M"; ""))*1000000; IF(ISNUMBER(FIND("K"; Z2)); VALUE(SUBSTITUTE(Z2; "K"; ""))*1000; Z2*1))

    5.'W/F', 'SM', 'IR'

    Values included stars. Stars were removed and string turned to numbers.

    =LEFT(BO2; 1)

    Conclusion

    The clean dataset is now ready for more analysis, such as exploring player statistics, team performance, or other insigths that can provide a deeper understanding of the FIFA 21 game.

  9. Huge US 514 Stocks + 1298 columns Market Data 25Gb

    • kaggle.com
    zip
    Updated Jan 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleg Shpagin (2024). Huge US 514 Stocks + 1298 columns Market Data 25Gb [Dataset]. https://www.kaggle.com/datasets/olegshpagin/extra-us-stocks-market-data
    Explore at:
    zip(8646680017 bytes)Available download formats
    Dataset updated
    Jan 2, 2024
    Authors
    Oleg Shpagin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    United States
    Description

    Huge US Stocks prices + 1292 columns extra data from Indicators. This Dataset provides historical Open, High, Low, Close, and Volume (OHLCV) prices of stocks traded in the United States financial markets AND calculated 1292 columns of indicators. You can use all this hyge data for stock price predictions.

    Columns with Momentum Indicator values ADX - Average Directional Movement Index ADXR - Average Directional Movement Index Rating APO - Absolute Price Oscillator AROON - Aroon AROONOSC - Aroon Oscillator BOP - Balance Of Power CCI - Commodity Channel Index CMO - Chande Momentum Oscillator DX - Directional Movement Index MACD - Moving Average Convergence/Divergence MACDEXT - MACD with controllable MA type MACDFIX - Moving Average Convergence/Divergence Fix 12/26 MFI - Money Flow Index MINUS_DI - Minus Directional Indicator MINUS_DM - Minus Directional Movement MOM - Momentum PLUS_DI - Plus Directional Indicator PLUS_DM - Plus Directional Movement PPO - Percentage Price Oscillator ROC - Rate of change : ((price/prevPrice)-1)*100 ROCP - Rate of change Percentage: (price-prevPrice)/prevPrice ROCR - Rate of change ratio: (price/prevPrice) ROCR100 - Rate of change ratio 100 scale: (price/prevPrice)*100 RSI - Relative Strength Index STOCH - Stochastic STOCHF - Stochastic Fast STOCHRSI - Stochastic Relative Strength Index TRIX - 1-day Rate-Of-Change (ROC) of a Triple Smooth EMA ULTOSC - Ultimate Oscillator WILLR - Williams' %R

    Columns with Volatility Indicator values ATR - Average True Range NATR - Normalized Average True Range TRANGE - True Range

    Columns with Volume Indicator values AD - Chaikin A/D Line ADOSC - Chaikin A/D Oscillator OBV - On Balance Volume

    Columns with Overlap Studies values BBANDS - Bollinger Bands DEMA - Double Exponential Moving Average EMA - Exponential Moving Average HT_TRENDLINE - Hilbert Transform - Instantaneous Trendline KAMA - Kaufman Adaptive Moving Average MA - Moving average MAMA - MESA Adaptive Moving Average MAVP - Moving average with variable period MIDPOINT - MidPoint over period MIDPRICE - Midpoint Price over period SAR - Parabolic SAR SAREXT - Parabolic SAR - Extended SMA - Simple Moving Average T3 - Triple Exponential Moving Average (T3) TEMA - Triple Exponential Moving Average TRIMA - Triangular Moving Average WMA - Weighted Moving Average

    Columns with Cycle Indicator values HT_DCPERIOD - Hilbert Transform - Dominant Cycle Period HT_DCPHASE - Hilbert Transform - Dominant Cycle Phase HT_PHASOR - Hilbert Transform - Phasor Components HT_SINE - Hilbert Transform - SineWave HT_TRENDMODE - Hilbert Transform - Trend vs Cycle Mode

    If you want to download actual data - on today for example, then you can use python code from my github. tickers = ['CE.US', 'WELL.US', 'GRMN.US', 'IEX.US', 'CAG.US', 'BEN.US', 'ATO.US', 'WY.US', 'TSCO.US', 'COR.US', 'MOS.US', 'SWKS.US', 'ORCL.US', 'URI.US', 'INCY.US', 'MPC.US', 'HD.US', 'PPG.US', 'NUE.US', 'DDOG.US', 'HSIC.US', 'CAT.US', 'HSY.US', 'MKTX.US', 'CCEP.US', 'GWW.US', 'LEN.US', 'IFF.US', 'GL.US', 'MDB.US', 'SNPS.US', 'KR.US', 'DVN.US', 'SYY.US', 'USB.US', 'DRI.US', 'PARA.US', 'FMC.US', 'UBER.US', 'WRK.US', 'DLR.US', 'SO.US', 'AMGN.US', 'MA.US', 'STT.US', 'BWA.US', 'KVUE.US', 'GFS.US', 'BBY.US', 'BK.US', 'MRVL.US', 'VFC.US', 'EIX.US', 'ADSK.US', 'ZBH.US', 'MU.US', 'HUBB.US', 'PEAK.US', 'CVX.US', 'CPB.US', 'GILD.US', 'BXP.US', 'DD.US', 'MCD.US', 'KDP.US', 'GE.US', 'PKG.US', 'HST.US', 'WTW.US', 'XOM.US', 'ED.US', 'SPG.US', 'PFG.US', 'LVS.US', 'FAST.US', 'ROST.US', 'TTD.US', 'CNC.US', 'PGR.US', 'CMI.US', 'TEAM.US', 'MELI.US', 'BKR.US', 'EBAY.US', 'CPRT.US', 'MSFT.US', 'HOLX.US', 'ABBV.US', 'AMZN.US', 'FE.US', 'WYNN.US', 'KMI.US', 'APA.US', 'CRWD.US', 'DPZ.US', 'EQT.US', 'NOC.US', 'TAP.US', 'ETR.US', 'T.US', 'OMC.US', 'MTCH.US', 'TRMB.US', 'EXPE.US', 'DTE.US', 'PNR.US', 'LH.US', 'ALL.US', 'CTRA.US', 'VMC.US', 'XRAY.US', 'NWS.US', 'GOOGL.US', 'WEC.US', 'BIIB.US', 'LLY.US', 'BMY.US', 'STE.US', 'NI.US', 'MKC.US', 'AMT.US', 'CFG.US', 'LW.US', 'HIG.US', 'ETSY.US', 'AON.US', 'ULTA.US', 'DVA.US', 'LKQ.US', 'MPWR.US', 'TEL.US', 'FICO.US', 'CVS.US', 'CMA.US', 'NVDA.US', 'TDG.US', 'AWK.US', 'PSA.US', 'FOXA.US', 'ON.US', 'ODFL.US', 'NVR.US', 'ROP.US', 'TFX.US', 'HLT.US', 'EXPD.US', 'FOX.US', 'D.US', 'AMAT.US', 'AZO.US', 'DLTR.US', 'TT.US', 'SBUX.US', 'JNJ.US', 'HAS.US', 'DASH.US', 'NRG.US', 'JNPR.US', 'BIO.US', 'AMD.US', 'NFLX.US', 'VLTO.US', 'BRO.US', 'REGN.US', 'WRB.US', 'LRCX.US', 'SYK.US', 'MCO.US', 'CSGP.US', 'TROW.US', 'ETN.US', 'RTX.US', 'CRM.US', 'SIRI.US', 'UPS.US', 'HES.US', 'RSG.US', 'PEP.US', 'MET.US', 'HON.US', 'IQV.US', 'JPM.US', 'DG.US', 'CBRE.US', 'NDSN.US', 'DOW.US', 'SBAC.US', 'TSN.US', 'IT.US', 'WM.US', 'TPR.US', 'IBM.US', 'CHTR.US', 'HAL.US', 'ROL.US', 'FDS.US', 'SHW.US', 'EW.US', 'RJF.US', 'APH.US', 'AIZ.US', 'ZBRA.US', 'SRE.US', 'CTAS.US', 'PXD.US', 'MTD.US', 'NOW.US', 'MAS.US', 'FFIV.US', 'ELV.US', 'SYF.US', 'CSCO.US', 'APTV...

  10. z

    Greg Kolodziejzyk's 13-year associative remote viewing experiment results...

    • zenodo.org
    csv
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greg Kolodziejzyk; Greg Kolodziejzyk (2024). Greg Kolodziejzyk's 13-year associative remote viewing experiment results (session-level data) [Dataset]. http://doi.org/10.5281/zenodo.14165838
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    Psi Open Data
    Authors
    Greg Kolodziejzyk; Greg Kolodziejzyk
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This dataset is a transformation of Greg Kolodziejzyk's remote viewing data (see Related datasets below). Greg used a "rapid-fire" technique whereby several short free-response remote viewing trials were completed in a single session. The trial-level data was transformed by Adrian Ryan into session-level Z-scores by exact binomial, in order that the data could be combined with those from other experiments, for the analyses reported here:

    • Ryan, A., & Spottiswoode, J. (2015). Variation of ESP by Season, Local Sidereal Time, and Geomagnetic Activity. Extrasensory Perception, 377-394.

    Three files are included:

    • Data file
    • Code book
    • Transform Procedure: The R script used to transform the original trial-level data into session-level data.
  11. R

    Defective TALDO1 does not transform Fru(6)P, E4P to SH7P, GA3P

    • reactome.org
    biopax2, biopax3 +5
    Updated Oct 9, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter D'Eustachio (2013). Defective TALDO1 does not transform Fru(6)P, E4P to SH7P, GA3P [Dataset]. https://reactome.org/content/detail/R-HSA-5659998
    Explore at:
    docx, biopax3, sbml, owl, biopax2, pdf, sbgnAvailable download formats
    Dataset updated
    Oct 9, 2013
    Dataset provided by
    NYU School of Medicine, Department of Biochemistry
    Authors
    Peter D'Eustachio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Defective TALDO1 (transaldolase 1) fails to transform fructose 6-phosphate (Fru(6)P) and erythrose 4-phosphate (E4P) to sedoheptulose 7-phosphate (SH7P) and glyceraldehyde 3-phosphate (GA3P). This defect has been associated with congenital liver disease and an array of other symptoms. The deficiency was first described by Verhoeven and colleagues (2001). Both the range and severity of these abnormalities are variable from patient to patient (Wamelink et al. 2008a; Eyaid et al. 2013). The three missense mutant alleles annotated here are associated with absence of detectable transaldolase activity in tissues from homozygous affected individuals (LeDuc et al. 2014; Verhoeven et al. 2005; Wamelink et al. 2008b).

  12. Cyclisitic Trip Data 2019 (Google)

    • kaggle.com
    zip
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaine Pepper (2022). Cyclisitic Trip Data 2019 (Google) [Dataset]. https://www.kaggle.com/datasets/shainepepper/divvy-2019-trip-data-clean
    Explore at:
    zip(27551971 bytes)Available download formats
    Dataset updated
    Aug 4, 2022
    Authors
    Shaine Pepper
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Intro

    Cleaning this data took some time due to many NULL values, typos, and unorganized collection. My first step was to put the dataset into R and work my magic there. After analyzing and cleaning the data, I moved the data to Tableau to create easily understandable and helpful graphs. This step was a learning curve because there are so many potential options inside Tableau. Finding the correct graph to share my findings while keeping the stakeholders' tasks in mind was my biggest obstacle.

    RStudio

    Firstly I needed to combine the 4 datasets into 1, I did this using the rbind() function.

    Step two was to remove typos or poorly named columns. colnames(Cyclistic_Data_2019)[colnames(Cyclistic_Data_2019) == "tripduration"] <- "trip_duration" colnames(Cyclistic_Data_2019)[colnames(Cyclistic_Data_2019) == "bikeid"] <- "bike_id"' colnames(Cyclistic_Data_2019)[colnames(Cyclistic_Data_2019) == "usertype"] <- "user_type" colnames(Cyclistic_Data_2019)[colnames(Cyclistic_Data_2019) == "birthyear"] <- "birth_year"

    Next step was to remove all NULL and over exaggerated numbers. Such as trip durations more than 10 hours long.

    library(dplyr) Cyclistic_Clean_v2 <- Cyclistic_Data_2019 %>% filter(across(where(is.character), ~ . != "NULL")) %>% type.convert(as.is = TRUE)

    Once removing the NULL data, it was time to remove potential typos and poorly collected data. I could only identify exaggerated data under the "trip_duration" column. Finding that there were multiple cases of 2,000,000 + second trips. To find these large values, I used the count() function.

    Cyclistic_Clean_v2 %>% count(Cyclistic_Clean_v2, trip_duration > "30000")

    After finding multiple instances of this, I ran into a hard spot, the trip_duration column was categorized as a character when it needed to be numeric to be further cleaned. it took me quite a while to find out that this was an issue, and then I remembered the class() function. With this, I was easily able to identify that the classification was wrong

    class(Cyclistic_Clean_v2$trip_duration)

    Once identifying the classification, I still had some work to do before converting it to an integer as it contained quotations, periods, and a trailing 0. To remove these I used the gsub() function.

    Cyclistic_Clean_v2$trip_duration <- gsub(".0", "", Cyclistic_Clean_v2$trip_duration) Cyclistic_Clean_v2$trip_duration <- gsub('"', '', Cyclistic_Clean_v2$trip_duration)

    Now that unwanted characters are gone, we can convert the column into numeric.

    Cyclistic_Clean_v2$trip_duration <- as.numeric(Cyclistic_Clean_v2$trip_duration)

    Doing this allows Tableau and R to read the data properly to create graphs without error.

    Next I created a backup dataset incase there was any issue while exporting.

    Cyclistic_Clean_v3 <- Cyclistic_Clean_v2 write.csv(Cyclistic_Clean_v2,"Folder.Path\Cyclistic_Data_Cleaned_2019.csv", row.names = FALSE)

    After exporting I came to the conclusion that I should have put together a more accurate change log rather than brief notes. That is one major learning lesson I will take away from this project.

    All around, I had a lot of fun using R to transform and analyze the data. I learned many of different ways to efficiently clean data.

    Tableau

    Now onto the fun part! Tableau is a very good tool to learn. There are so many different ways to bring your data to life and show your creativity inside your work. After a few guides and errors, I could finally start building graphs to bring the stakeholders' tasks to fruition.

    Charts

    Please note this are all made in tableau and meant to be interactive.

    Here you can find the relation between male and female riders.

    View post on imgur.com

    Male vs Female tripduration with usertype

    View post on imgur.com

    Busiest stations filtered by months. (This is meant to be interactive.)

    View post on imgur.com

    Most popular starting stations.

    View post on imgur.com

    Most popular ending stations.

    View post on imgur.com

    Conclusion

    My main goal was to help find out how Cyclistic can convert casual riders into subscribers. Here is my findings.

    1. Casual riders ride much longer than subscribers duration wise.
    2. Although there are many more male riders, females tend to ride longer than males.
    3. Stations #562 & #568 are the most busy by a h...
  13. R

    Defective APRT does not convert adenine to AMP

    • reactome.org
    biopax2, biopax3 +5
    Updated Jul 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter D'Eustachio (2021). Defective APRT does not convert adenine to AMP [Dataset]. https://reactome.org/content/detail/R-HSA-9734193
    Explore at:
    sbml, owl, biopax3, biopax2, docx, pdf, sbgnAvailable download formats
    Dataset updated
    Jul 10, 2021
    Dataset provided by
    NYU School of Medicine, Department of Biochemistry
    Authors
    Peter D'Eustachio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Normally in humans, adenine generated in processes such as polyamine biosynthesisis can be salvaged by conversion to AMP, catalyzed by APRT (adenine phosphoribosyltransferase). In the absence of APRT activity, however, accumulated adenine is instead converted to 2,8-dioxo-adenine. Accumulation of insoluble crystals of 2,8-dioxo-adenine in the kidneys causes the kidney damage that is a major symptom of APRT deficiency in humans (Van Acker et al. 1977; Bollée et al. 2012). Three missense mutant alleles are annotated here (Chen et al. 1991; Hidaka et al. 1988; Sahota et al. 1994); nonsense, insertion-deletion, and splice-site mutations have also been reported (reviewed by Bollée et al. 2012).

  14. Customer Behavior & Purchase Dataset

    • kaggle.com
    zip
    Updated Dec 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad ali raja (2025). Customer Behavior & Purchase Dataset [Dataset]. https://www.kaggle.com/datasets/mohammadaliraja/customer-data
    Explore at:
    zip(23484 bytes)Available download formats
    Dataset updated
    Dec 13, 2025
    Authors
    Mohammad ali raja
    Description

    This is a synthetic dataset simulating 1,000 customers of an e-commerce business. It is designed for intermediate-level data science tasks: EDA, feature engineering, classification (predicting whether a customer will purchase next month), regression (predicting next month's spend), and customer segmentation (clustering). The dataset contains demographic, engagement, and historical purchase features with realistic correlations and controlled missingness for preprocessing exercises.

    Data Dictionary (variable descriptions): - CustomerID (int): Unique customer identifier. - Age (int): Age in years. - Gender (categorical): 'Male', 'Female', or missing. - Membership (categorical): 'None', 'Basic', 'Premium' - indicates loyalty program. - Annual_Income_kUSD (float): Annual income in thousands USD. - Region (categorical): Geographic region bucket. - Platform (categorical): 'Mobile', 'Desktop', 'Tablet' - main device used. - Product_Preference (categorical): Most frequently browsed product category. - Visits_per_Month (int): Average monthly site visits. - Time_Spent_per_Visit_min (float): Average session duration in minutes (some missing values). - Pages_Visited_per_Session (int): Average pages visited per session. - Previous_Purchases (int): Count of historical purchases. - Avg_Purchase_Value_USD (float): Average purchase value in USD (some missing values). - Total_Purchase_Value_USD (float): Lifetime total spending in USD. - Days_Since_Last_Purchase (int): Days since last purchase. - Made_Purchase_Last_Month (binary): 1 if a purchase was made in the last month, else 0. - Will_Purchase_Next_Month (binary): Target for classification (1/0). - Next_Month_Spend_USD (float): Regression target (0.0 if no purchase expected).

    Missingness: - ~6% missing in Time_Spent_per_Visit_min. - ~4% missing in Avg_Purchase_Value_USD. - ~3% missing Gender values. These are intentional to give realistic preprocessing tasks (imputation, encoding).

    Tags / Metadata: Tags: e-commerce, customer-behavior, classification, regression, segmentation, synthetic, intermediate

    Quality Notes & Limitations: - Synthetic data; patterns were generated to be realistic but are not from real customers. - Useful for learning and coursework but not for production decisions. - Some variables are correlated by design to allow model-building exercises.

    Suggested baseline models & evaluation metrics:

    1. Classification (Will_Purchase_Next_Month):

      • Models: Logistic Regression, Random Forest Classifier, XGBoost/LightGBM (if available), KNN, SVM.
      • Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC.
      • Suggested steps: Handle missing values (imputation), encode categoricals, scale numeric features, train/test split (e.g., 80/20), cross-validation, and evaluate metrics.
    2. Regression (Next_Month_Spend_USD):

      • Models: Linear Regression, Random Forest Regressor, Gradient Boosting, KNN Regressor.
      • Metrics: RMSE, MAE, R^2.
      • Suggested steps: log-transform skewed targets if needed, treat zeros (non-purchasers) appropriately, consider two-stage modeling (classification first, then regression for those predicted to purchase).
  15. Dataset.

    • figshare.com
    xlsx
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chihiro Suzuki; Yoko Suzuki; Takashi Abe; Takashi Kanbayashi; Shoji Fukusumi; Toshio Kokubo; Isamu Takahara; Masashi Yanagisawa (2025). Dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0316579.s014
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chihiro Suzuki; Yoko Suzuki; Takashi Abe; Takashi Kanbayashi; Shoji Fukusumi; Toshio Kokubo; Isamu Takahara; Masashi Yanagisawa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In remote areas, visiting a laboratory for sleep testing is inconvenient. We, therefore, developed a Mobile Sleep Lab in a bus powered by fuel cells with two sleep measurement chambers. As the environment in the bus could affect sleep, we examined whether sleep testing in the Mobile Sleep Lab was as feasible as in a conventional sleep laboratory (Human Sleep Lab). We tested 15 healthy adults for four nights using polysomnography (the first two nights at the Human Sleep Lab or Mobile Sleep Lab with a switch to the other facility for the next two nights). Sleep variables of the four measurements were used to assess the discrepancy of different places or different nights. No significant differences were found between the laboratories other than the percentage of total sleep time in stage N3. Next, we analyzed the intraclass correlation coefficient to evaluate the test-retest reliability. The intraclass correlation coefficient between these two measurements: the Human Sleep Lab and Mobile Sleep Lab showed similar reliability for the same sleep variables. The intraclass correlation coefficient revealed that several sleep indexes, such as total sleep time, sleep efficiency, wake after sleep onset, percentage of stage N1, and stage R latency, showed poor reliabilities (

  16. R

    Defective CYP19A1 does not convert ANDST to E1

    • reactome.org
    biopax2, biopax3 +5
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bijay Jassal, Defective CYP19A1 does not convert ANDST to E1 [Dataset]. https://reactome.org/content/detail/R-HSA-5601849
    Explore at:
    docx, sbml, biopax2, sbgn, owl, biopax3, pdfAvailable download formats
    Dataset provided by
    Ontario Institute for Cancer Research
    Authors
    Bijay Jassal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Aromatase (CYP19A1) catalyses the conversion of androstenedione (ANDST) to estrone (E1). Defects in CYP19A1 can cause aromatase excess syndrome (AEXS; MIM:139300) and aromatase deficiency (AROD; MIM:613546). Affected individuals cannot synthesise endogenous estrogens. In females the lack of estrogen leads to pseudohermaphroditism and progressive virilization at puberty, whereas in males pubertal development is normal. Mutations causing AEXS include C437Y, R375C, R365Q and E210K (Ito et al. 1993, Morishima et al. 1995, Carani et al. 1997, Maffei et al. 2004).

  17. R

    FIXa variant:FVIIIa does not convert FX to the active FXa

    • reactome.org
    biopax2, biopax3 +5
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica Shamovsky, FIXa variant:FVIIIa does not convert FX to the active FXa [Dataset]. https://reactome.org/content/detail/R-HSA-9670874
    Explore at:
    sbgn, biopax2, sbml, biopax3, pdf, owl, docxAvailable download formats
    Dataset provided by
    NYU School of Medicine, Department of Biochemistry
    Authors
    Veronica Shamovsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Factor IX (FIX) is a vitamin K-dependent trypsin-like serine protease zymogen in plasma, which upon activation to its active form (FIXa), binds to cofactor VIIIa (FVIIIa) on negatively charged membrane surfaces in the presence of Ca2+ to activate factor X (FX) in the intrinsic pathway of the blood clotting cascade (Davie EW et al. 1991; Ngo JC et al. 2008). FIX deficiency is associated with mild to severe bleeding in hemophilia B (HB) patients (Rallapalli PM et al. 2013). HB is caused by a wide range of mutations that can include point mutations (nonsense and missense), insertions, deletions and other complex rearrangements of the F9 gene (Rallapalli PM et al. 2013). Exons 7 and 8 encode the catalytic domain of FIX, which is responsible for the subsequent activation of FX in the coagulation cascade. Disease-causing mutations at these exons 7 and 8 produce dysfunctional FIX with impaired clotting enzyme activity (Usharani P et al. 1985: Attree O et al. 1989; Bajaj SP et al. 1990; Spitzer SG et al. 1990; Ludwig M et al. 1992; Lu Q et al. 2015). The Reactome event describes failed generation of FXa as the functional consequence of the defective serine protease activity of hemophilia B (HB)-associated FIX variants such as G363R & G363E (Lu Q et al. 2015), G357E (Miyata T et al. 1991), A436V (Usharani P et al. 1985), I443T (Hamaguchi N et al. 1991), G409V (Bajaj SP et al. 1990), D410H and S411G (Ludwig M et al. 1992).

  18. Time series of coordinates for station MAR1

    • doi.pangaea.de
    html, tsv
    Updated Jul 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luciano Pedro Oscar Mendoza; Eric R Marderwald; Mirko Scheinert; Andreas Richter; José Luis Hormaechea; Gerardo Connon; Reinhard Dietrich; Raúl A Perdomo (2021). Time series of coordinates for station MAR1 [Dataset]. http://doi.org/10.1594/PANGAEA.933995
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    PANGAEA
    Authors
    Luciano Pedro Oscar Mendoza; Eric R Marderwald; Mirko Scheinert; Andreas Richter; José Luis Hormaechea; Gerardo Connon; Reinhard Dietrich; Raúl A Perdomo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 2, 2007 - Jan 25, 2013
    Area covered
    Variables measured
    DATE/TIME, Cartesian coordinate, x, Cartesian coordinate, y, Cartesian coordinate, z
    Description

    This dataset is about: Time series of coordinates for station MAR1. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.934034 for more information.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lynn Russell (2025). NOAA R/V Ron Brown Fourier Transform Infrared Spectroscopy (FTIR) Data [Dataset]. http://doi.org/10.26023/87N8-35T6-RE0C

NOAA R/V Ron Brown Fourier Transform Infrared Spectroscopy (FTIR) Data

Explore at:
asciiAvailable download formats
Dataset updated
Oct 7, 2025
Dataset provided by
NSF NCAR Earth Observing Laboratory
Authors
Lynn Russell
Time period covered
Oct 21, 2008 - Nov 29, 2008
Area covered
Description

This file contains the Fourier Transform Infrared Spectroscopy (FTIR) Spectroscopy Data from NOAA R/V Ronald H. Brown ship during VOCALS-REx 2008.

Search
Clear search
Close search
Google apps
Main menu