64 datasets found
  1. Learn Data Science Series Part 1

    • kaggle.com
    Updated Dec 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rupesh Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

    Overview:

    • Chapter 1: Getting started with pandas
    • Chapter 2: Analysis: Bringing it all together and making decisions
    • Chapter 3: Appending to DataFrame
    • Chapter 4: Boolean indexing of dataframes
    • Chapter 5: Categorical data
    • Chapter 6: Computational Tools
    • Chapter 7: Creating DataFrames
    • Chapter 8: Cross sections of different axes with MultiIndex
    • Chapter 9: Data Types
    • Chapter 10: Dealing with categorical variables
    • Chapter 11: Duplicated data
    • Chapter 12: Getting information about DataFrames
    • Chapter 13: Gotchas of pandas
    • Chapter 14: Graphs and Visualizations
    • Chapter 15: Grouping Data
    • Chapter 16: Grouping Time Series Data
    • Chapter 17: Holiday Calendars
    • Chapter 18: Indexing and selecting data
    • Chapter 19: IO for Google BigQuery
    • Chapter 20: JSON
    • Chapter 21: Making Pandas Play Nice With Native Python Datatypes
    • Chapter 22: Map Values
    • Chapter 23: Merge, join, and concatenate
    • Chapter 24: Meta: Documentation Guidelines
    • Chapter 25: Missing Data
    • Chapter 26: MultiIndex
    • Chapter 27: Pandas Datareader
    • Chapter 28: Pandas IO tools (reading and saving data sets)
    • Chapter 29: pd.DataFrame.apply
    • Chapter 30: Read MySQL to DataFrame
    • Chapter 31: Read SQL Server to Dataframe
    • Chapter 32: Reading files into pandas DataFrame
    • Chapter 33: Resampling
    • Chapter 34: Reshaping and pivoting
    • Chapter 35: Save pandas dataframe to a csv file
    • Chapter 36: Series
    • Chapter 37: Shifting and Lagging Data
    • Chapter 38: Simple manipulation of DataFrames
    • Chapter 39: String manipulation
    • Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame
    • Chapter 41: Working with Time Series
  2. inDecay Training data : processed dataframe + indelgen

    • figshare.com
    txt
    Updated Feb 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wergillius Zheng (2024). inDecay Training data : processed dataframe + indelgen [Dataset]. http://doi.org/10.6084/m9.figshare.25133564.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 4, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Wergillius Zheng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The training data for reimplementing inDecay and FORECasT.The fasta file records the guide RNA, strand, cut-site, and target sequence matched by OligoID.The indelgen folder contains the indelgen file for each OligoID. Each indelgen file records all possible indel events estimated based on the target sequences.Finally, there are five processed dataframe (really big csv). This dataframe contains all the observed events and event frequency.

  3. Shopping Mall

    • kaggle.com
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anshul Pachauri
    Description

    Libraries Import:

    Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

    Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

    Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

    Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

    Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

    Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

    Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

    Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

    Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").

  4. Dataset for: Infectious disease responses to human climate change...

    • zenodo.org
    csv
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia Titcomb; Georgia Titcomb; Johnny Uelmen; Johnny Uelmen; Mark Janko; Mark Janko; Charles Nunn; Charles Nunn (2024). Dataset for: Infectious disease responses to human climate change adaptations [Dataset]. http://doi.org/10.5281/zenodo.13314361
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Georgia Titcomb; Georgia Titcomb; Johnny Uelmen; Johnny Uelmen; Mark Janko; Mark Janko; Charles Nunn; Charles Nunn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Measurement technique
    <div> <p>This dataset includes original data sources and data that have been extracted from other sources that are referenced in the manuscript entitled "Infectious disease responses to human climate change adaptations". </p> <p>Original data:</p> <p><strong>Table_1_source_papers</strong></p> <p>We conducted a Web of Science search following PRISMA guidelines (SI I). Search terms included each topic, followed by “AND (infectious disease* OR zoono* OR pathogen* OR parasit*) AND (human OR people).” Papers were assessed for any positive, negative, or neutral link between each topic (dam construction, crop shifts, rainwater harvesting, mining, migration, carbon sequestration, and public transit) and human infectious diseases. Searches on poultry and transit returned >5,000 papers, so searches were restricted to review topics only. We further restricted the 3479 results for livestock shifts to those with ‘shift’ in the abstract. Following screening of 3485 papers (6964 including all livestock), 108 papers met initial review criteria of being relevant to each adaptation or mitigation and discussing a human infectious disease; of which only 14 were quantitative studies with a control or reference group.</p> <p>Extracted data:</p> <ul> <li><strong>change_livestock_country</strong> <ul> <li>Data were extracted from Ogutu 2016 supplementary materials and include percent change calculations for different livestock in different Kenyan counties.</li> <li>Original data source citation: <p>Ogutu, J. O., Piepho, H.-P., Said, M. Y., Ojwang, G. O., Njino, L. W., Kifugo, S. C., & Wargute, P. W. (2016). Extreme wildlife declines and concurrent increase in livestock numbers in Kenya: What are the causes? <em>PloS ONE</em>, <em>11</em>(9), e0163249. https://doi.org/10.1371/journal.pone.0163249</p> </li> </ul> </li> <li><strong>country_avg_schist_wormy_world</strong> <ul> <li>Schistosomiasis survey data were obtained from the Global Atlas of Helminth Infection and were generated by downloading map data in csv format. Prevalence values were calculated by taking the mean maximum prevalence.</li> <li>Original data source citation: <p>London Applied & Spatial Epidemiology Research Group (LASER). (2023). <em>Global Atlas of Helminth Infections: STH and Schistosomiasis</em> [dataset]. London School of Hygiene and Tropical Medicine. https://lshtm.maps.arcgis.com/apps/webappviewer/index.html?id=2e1bc70731114537a8504e3260b6fbc0</p> </li> </ul> </li> <li><strong>kenya_precip_change_1951_2020</strong> <ul> <li>Data were extracted from the Climate Change Knowledge Portal and downloaded in csv format.</li> <li>Original data source citation: <p>World Bank Group. (2023). <em>Climate Data & Projections—Kenya</em>. Climate Change Knowledge Portal. https://climateknowledgeportal.worldbank.org/country/kenya/climate-data-projections</p> </li> </ul> </li> </ul> </div>
    Description

    Original and derived data products referenced in the original manuscript are provided in the data package.

    Description of the data and file structure

    Original data:

    Table_1_source_papers.csv: Papers that met review criteria and which are summarized in Table 1 of the manuscript.

    1. ID: The paper identification number
    2. Topic: The broad topic (i.e., each row of Table 1)
    3. Authors: The names of the authors of the paper
    4. Article Title: The title of the paper
    5. Source Title: The name of the journal in which the paper was published
    6. Abstract: The paper's abstract, retrieved from the Web of Science search
    7. study_type: Classification of the study methodology/approach. "A" = a designed study that shows effect ,"B" = a pre/post study, "C" = a comparison of health outcomes or pathogen risk relative to a 'control/comparison' area, "D" = some quantitative effect but no control, "E" = qualitative comments but little supporting evidence, and/or a qualitative review.
    8. pathogen_broad: Broad classification of the type of pathogen discussed in the paper.
    9. transmission_type: Categorization of indirect, direct, sexual, vector, or other transmission modes.
    10. pathogen_type: Categorization of bacteria, helminth, virus, protozoa, fungi, or other pathogen types.
    11. country: Country in which the study was performed or results discussed. When countries were not available, regions were used. NA values indicate papers in which a geographic region was not relevant to the study (i.e., a methods-based study).

    Derived data:

    change_livestock_country.csv: A dataframe containing values used to generate Figure 4a in the manuscript.

    1. County Name: The name of the county in Kenya
    2. Sheep and goats 1980: The estimated number of sheep and goats in 1980
    3. Sheep and goats 2016: The estimated number of sheep and goats in 2016
    4. pct_change_shoat: The percent change in sheep and goat numbers from 1980 to 2016
    5. Cattle 1980: The estimated number of cattle in 1980
    6. Cattle 2016: The estimated number of cattle in 2016
    7. pct_change_cattle: The percent change in cattle numbers from 1980 to 2016
    8. Camel 1980: The estimated number of camels in 1980
    9. Camel 2016: The estimated number of camels in 2016
    10. pct_change_camel: The percent change in camel numbers from 1980 to 2016
    11. human_pop 1980: The estimated human population in the county in 1980
    12. human_pop 2016: The estimated human population in the county in 1980
    13. pct_change_human: The percent change in the human population from 1980 to 2016
    14. area_sq_km: The land area of the county
    15. change_ind_per_sq_km_shoat: Absolute change in number of sheep and goats from 1980 to 2016
    16. change_ind_per_sq_km_cattle: Absolute change in number of cattle from 1980 to 2016
    17. change_ind_per_sq_km_camel: Absolute change in number of camels from 1980 to 2016

    country_avg_schist_wormy_world.csv: A dataframe containing values used to generate Figure 3 in the manuscript.

    • Country: The country in which the schistosome prevalence studies were performed.
    • Latitude: The latitute in decimal degrees
    • Longitude: The longitute in decimal degrees
    • Maximum.prevalence: The mean maximum schistosomiasis prevalence of studies conducted within each country.

    kenya_precip_change_1951_2020.csv: A dataframe containing values used to generate Figure 4b in the manuscript.

    • Precipitation (mm): Binned annual precipitation values
    • 1951-1980: The density of observations for each annual precipitation value for the 1951-1980 period
    • 1971-2000: The density of observations for each annual precipitation value for the 1971-2000 period
    • 1991-2020: The density of observations for each annual precipitation value for the 1991-2020 period

    Sharing/Access information

    Data were derived from the following sources:

  5. u

    Data from: dblp XML dataset as CSV for Python Data Analysis Library

    • observatorio-cientifico.ua.es
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo (2021). dblp XML dataset as CSV for Python Data Analysis Library [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc45db9e7c03b01bdb2d0
    Explore at:
    Dataset updated
    2021
    Authors
    Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo
    Description

    Based on the dblp XML file, this dataset consists on a CSV file that has been extracted using a python script. The dataset can be easily loaded in a Python Data Analysis Library dataframe.

  6. Z

    Longitudinal corpus of privacy policies

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wagner, Isabel (2022). Longitudinal corpus of privacy policies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5841138
    Explore at:
    Dataset updated
    Dec 12, 2022
    Dataset authored and provided by
    Wagner, Isabel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a corpus of 56,416 unique privacy policy texts spanning the years 1996-2021.

    policy-texts.zip contains a directory of text files with the policy texts. File names are the hashes of the policy text.

    policy-metadata.zip contains two CSV files (can be imported into a pandas dataframe) with policy metadata including readability measures for each policy text.

    labeled-policies.zip contains CSV files with content labels for each policy. Labeling was done using a BERT classifier.

    Details on the methodology can be found in the accompanying paper.

  7. Z

    HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dutton, Rachel J (2023). HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7613702
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Dutton, Rachel J
    Weiss, Emily CP
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is associated with this HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities pub from Arcadia Science.

    HiPR-FISH spatial imaging was used to look at the distribution of microbes within five distinct microbial communities growing on the surface of aged cheeses. Probe design and imaging was performed by Kanvas Biosciences.

    This dataset includes the following:

    For each field of view (roughly 135µm x 135µm; 7 FOVs per each cheese specimen):

    A fluorescence intensity image (*_spectral_max_projection.png/.tif).

    A pseudo-colored microbe-labeled image (*_identification.png/.tif).

    A data frame contains each identified microbe's identity, position, and size (*_cell_information.csv).

    A segmented mask for microbiota (*_segmentation.png/.tif)

    A spatial proximity graph for each species close to each other, showing the spatial enrichment over random distribution (*_spatialheatmap.png).

    A corresponding data frame used to generate the spatial proximity graph (_absolute_spatial_association.csv) and dataframe for the average of 500 random shuffles of the taxa (_randomized_spatial_association_matrix.csv).

    For each cheese specimen:

    A widefield image with FOVs located on the image (*_WF_overlay.png).

    In general:

    A png showing the color legend for each species. (ARC1_taxa_color_legend.png)

    A data frame showing the environmental location of each FOV in the cheese (RIND/CURD) and the location of each FOV relative to FOV 1. (ARC1_Cheese_Map.csv).

    A vignette showing an example of each cell and its false coloring according to its taxonomic identification (ARC1_detected_species_representative_cell_vignette.png).

    Sequences used as input in probe design (16S_18S_forKanvas.fasta).

    A CSV file containing the sequences that belong to each ASV (ARC1_sequences_to_ASVs.csv).

    Plots of log-transformed counts for each microbe detected across all FOVs, and broken down for each cheese (*detected_species_absolute_abundance.png).

    CSVs containing pairwise correlation of FOVs based on spatial association (ARC1_spatial_association_FOV_correlation.csv) and microbial abundance (ARC1_abundance_FOV_correlation.csv).

    Plots of spatial association matrices, aggregated for different cheeses and different locations (RIND vs CURD) (*samples_*loc_relative_spatial_association.png).

    CSV containing the principle component coordinates for each FOV (ARC1_abundance_FOV_PCA.csv, ARC1_spatial_association_FOV_PCA.csv).

    CSV containing the mean fold-change in number of edges between each ASV and the corresponding p-value when compared to the null state (random spatial association matrices) (ARC1_spatial_enrichment_significance.csv).

  8. h

    oldIT2modIT

    • huggingface.co
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Romano (2025). oldIT2modIT [Dataset]. https://huggingface.co/datasets/cybernetic-m/oldIT2modIT
    Explore at:
    Dataset updated
    Jun 3, 2025
    Authors
    Massimo Romano
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Download the dataset

    At the moment to download the dataset you should use Pandas DataFrame: import pandas as pd df = pd.read_csv("https://huggingface.co/datasets/cybernetic-m/oldIT2modIT/resolve/main/oldIT2modIT_dataset.csv")

    You can visualize the dataset with: df.head()

    To convert into Huggingface dataset: from datasets import Dataset dataset = Dataset.from_pandas(df)

      Dataset Description
    

    This is an italian dataset formed by 200 old (ancient) italian sentence and… See the full description on the dataset page: https://huggingface.co/datasets/cybernetic-m/oldIT2modIT.

  9. d

    Dataset for: Cattle aggregations at shared resources create potential...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia Titcomb; Jenna Hulke; John Naisikie Mantas; Benard Gituku; Hillary Young (2023). Dataset for: Cattle aggregations at shared resources create potential parasite exposure hotspots for wildlife [Dataset]. http://doi.org/10.5061/dryad.vdncjsz28
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Georgia Titcomb; Jenna Hulke; John Naisikie Mantas; Benard Gituku; Hillary Young
    Time period covered
    Jan 1, 2023
    Description

    Globally rising livestock populations and declining wildlife numbers are likely to dramatically change disease risk for wildlife and livestock, especially at resources where they congregate. However, limited understanding of interspecific transmission dynamics at these hotspots hinders disease prediction or mitigation. In this study, we combined gastrointestinal nematode density and host foraging activity measurements from our prior work in this system with three estimates of parasite-sharing capacity to investigate how interspecific exposures alter the relative riskiness of an important resource – water – among cattle and five dominant herbivore species in an East African tropical savanna. We found that due to their high parasite output, water dependence, and parasite-sharing capacity, cattle greatly increased potential parasite exposures at water sources for wild ruminants. When untreated for parasites, cattle accounted for over two-thirds of total potential exposures around water fo..., , , # Dataset for Cattle aggregations at shared resources create potential parasite exposure hotspots for wildlife

    https://doi.org/10.5061/dryad.vdncjsz28

    These data accompany the publication "Cattle aggregations at shared resources create potential parasite exposure hotspots for wildlife" in Proceedings of the Royal Society B: Biological Sciences (doi: 10.1098/rspb.2023-2239).

    Description of the data and file structure

    The data include three data files and code to replicate results of the publication. Specifically, the data files are:

    1. 1) fec_data.csv: A dataframe containing parasite fecal egg count values for focal species in the study.
    2. 2) parasite_risk_at_water.csv: A dataframe containing information on parasite exposure estimates for different species over the study period. Columns are as follows:
      1. Period: The dung survey period (numbered based on number of months elapsed).
      2. Date: The initial survey date.
      3. ...
  10. d

    National Water Model RouteLinks CSV

    • dataone.org
    • hydroshare.org
    • +2more
    Updated Apr 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason A Regina; Austin Raney (2022). National Water Model RouteLinks CSV [Dataset]. http://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
    Explore at:
    Dataset updated
    Apr 15, 2022
    Dataset provided by
    Hydroshare
    Authors
    Jason A Regina; Austin Raney
    Time period covered
    Apr 12, 2019 - Oct 14, 2021
    Area covered
    Description

    This resource contains "RouteLink" files for version 2.1.6 of the National Water Model which are used to associate feature identifiers for computational reaches to relevant metadata. These data are important for comparing NWM feature data to USGS streamflow and lake observations. The original RouteLink files are in NetCDF format and available here: https://www.nco.ncep.noaa.gov/pmb/codes/nwprod

    This resource includes the files in a human-friendlier CSV format for easier use, and a machine-friendlier file in HDF5 format which contains a single pandas.DataFrame. The scripts and supporting utilities are also included for users that wish to rebuild these files. Source code is hosted here: https://github.com/jarq6c/NWM_RouteLinks

  11. Speed profiles of freeways in California (I5-S and I210-E)

    • zenodo.org
    csv
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semin Kwak; Semin Kwak (2020). Speed profiles of freeways in California (I5-S and I210-E) [Dataset]. http://doi.org/10.5281/zenodo.3478594
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Semin Kwak; Semin Kwak
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    California
    Description

    Speed profiles of freeways in California (I5-S and I210-E). Original data is retrieved from PeMS.

    Each YEAR_FREEWAY.csv file contains Timestamp and Speed data.

    freeway_meta.csv file contains meta information for each detector: freeway number, direction, detector ID, absolute milepost, and x y coordinates.

    # Freeway speed data description
    
    ### Data loading example (single freeway: I5-S 2012)
    
    
    ```python
    %%time
    import pandas as pd
    
    # Date time parser
    mydateparser = lambda x: pd.datetime.strptime(x, "%m/%d/%Y %H:%M:%S")
    
    # Freeway data loading (This part should be changed to a proper URL in zenodo.org)
    data = pd.read_csv("dataset/2012_I5S.csv", 
              parse_dates=["Timestamp"],
              date_parser=mydateparser).pivot(index="Timestamp",columns='Station_ID', values='Speed')
    
    
    # Meta data loading
    meta = pd.read_csv("dataset/freeway_meta.csv").set_index(['Fwy','Dir'])
    ```
    
      CPU times: user 50.5 s, sys: 911 ms, total: 51.4 s
      Wall time: 50.9 s
    
    
    ### Speed data and meta data
    
    
    ```python
    data.head()
    ```
    
    
    
    
    
    Station_ID12345678910...80818283848586878889
    Timestamp
    2012-01-01 06:00:0070.069.870.169.669.970.870.169.369.268.2...72.167.671.066.865.958.267.163.867.171.6
    2012-01-01 06:05:0069.269.869.869.469.569.568.367.567.467.2...71.566.169.567.468.359.066.960.866.665.7
    2012-01-01 06:10:0069.269.068.668.768.668.961.768.367.467.7...71.165.271.266.565.459.666.358.468.265.6
    2012-01-01 06:15:0069.969.669.769.269.069.165.367.667.166.8...69.967.169.366.968.260.666.055.567.169.7
    2012-01-01 06:20:0068.768.468.267.968.369.367.068.468.268.2...70.967.269.965.666.762.866.262.667.267.5

    5 rows × 89 columns

    ```python meta.head() ```
    IDAbs_mpLatitudeLongitude
    FwyDir
    5S10.05832.542731-117.030501
    S20.14632.543587-117.031769
    S31.29132.552409-117.048120
    S42.22232.558422-117.062360
    S52.55932.561106-117.067228
    ### Choose a day ```python # Sampling (2012-01-13) myday = "2012-01-13" # Filter the data by the day myday_speed_data = data.loc[myday] ``` ### A speed profile ```python from matplotlib import pyplot as plt import matplotlib.dates as mdates # Axis value setting mp = meta[meta.ID.isin(data.columns)].Abs_mp hour = myday_speed_data.index # Draw the day fig, ax = plt.subplots() heatmap = ax.pcolormesh(hour,mp,myday_speed_data.T, cmap=plt.cm.RdYlGn, vmin=0, vmax=80, alpha=1) plt.colorbar(heatmap, ax=ax) # Appearance setting ax.xaxis.set_major_formatter(mdates.DateFormatter("%H")) plt.title(pd.Timestamp(myday).strftime("%Y-%m-%d [%a]")) plt.xlabel("hour") plt.ylabel("milepost") plt.show() ``` ![png](output_9_0.png)

  12. Z

    GENEActiv accelerometer file related to the #120 OxWearables / stepcount...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wattelez, Guillaume (2024). GENEActiv accelerometer file related to the #120 OxWearables / stepcount issue [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11557420
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Wattelez, Guillaume
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of .bin file that have an IndexError when processing.

    Consider #120 OxWearables / stepcount issue for more details.

    The .csv files are 1-second epoch conversions from the .bin file and contain time, x, y, z columns. The conversion was done by:

    reading the .bin with the GENEAread R package.

    keeping only the time, x, y and z columns.

    saving the data.frame into a .csv file.

    The only difference between the .csv files is the column format used for the time column before saving:

    time column in XXXXXX_....csv had a string class

    time column in XXXXXT....csv had a "POSIXct" "POSIXt" class

  13. Extracted data

    • figshare.com
    txt
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofie S. Kristensen; Henrik Jörntell (2024). Extracted data [Dataset]. http://doi.org/10.6084/m9.figshare.26180284.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 4, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Sofie S. Kristensen; Henrik Jörntell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For each recorded neuron is:All spike onset times (NAME_spikeTimes.csv)All LFP-SPW onset times (NAME_lfpTimes.csv)All ECoG-SPW onset times (NAME_eegTimes.csv)Dataframe with stimulation onset times and descriptive statistics of LFP-SPWs and ECoG-SPWs preceding stimulations (NAME_df_new.csv)

  14. Data and code for the manuscript - The hidden biodiversity knowledge split...

    • zenodo.org
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Anonymous; Anonymous Anonymous (2025). Data and code for the manuscript - The hidden biodiversity knowledge split in biological collections [Dataset]. http://doi.org/10.5281/zenodo.15248066
    Explore at:
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous Anonymous; Anonymous Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 19, 2025
    Description

    # General overview

    This repository contains the data and code used in the analysis of the
    manuscript entitled **"The hidden biodiversity knowledge split in biological collections"**.

    # Context

    Ecological and evolutionary processes generate biodiversity, yet how biodiversity data are organized and shared globally can shape our understanding of these processes. We show that name-bearing type specimens—the primary reference for species identity—of all freshwater and brackish fish species are predominantly housed in Global North museums, disconnected from their countries of origin. This geographical divide creates a ‘knowledge split’ with consequences for biodiversity science, particularly in the Global South, where researchers face barriers in studying native species’ name bearers housed abroad. Meanwhile, Global North collections remain flooded with non-native name bearers. We relate this imbalance to historical and socioeconomic factors, which ultimately restricts access to critical taxonomic reference materials and hinders global species documentation. To address this disparity, we call for international initiatives to promote fairer access to biological knowledge, including specimen repatriation, improved accessibility protocols for researchers in countries where specimens originated, and inclusive research partnerships.

    # Repository structure

    ## data

    This folder stores raw and processed data used to perform all the
    analysis presented in this study

    ### raw

    - `flow_period_region_country.csv` a data frame in the long format
    containing the flowing of NBT per regions per per time (50-year time
    frame). Variables:

    - `period` numeric variable representing 50-year time intervals

    - `region_type` character representing the name of the World Bank region
    of the country where the NBT was sourced

    - `country_type` character. A three letter code (alpha-3 ISO3166) representing
    the country of the museum where the NBT was sourced

    - `region_museum` character. Name of the World Bank region of the country
    where the NBT is housed

    - `country_museum` character. A three letter code (alpha-3 ISO3166) representing
    the country of the museum where the NBT is housed

    - `n` numeric. The number of NBT flowing from one country to another

    - `spp_native_distribution.csv` data frame in the long format
    containing the native composition at the country level. Variables:

    - `valid_name` character. The name of a species in the format genus_epithet
    according to the Catalog of Fishes

    - `country_distribution` character. Three letter code (alpha-3 ISO3166)
    indicating the name of the country where a species is native to

    - `region_distribution` character. The name of the region acording with
    World Bank where a species is native to

    - `spp_type_distribution.csv` data frame in the long format containing
    the composition of NBT by country. Variables:


    - `valid_name` character. The name of a species in the format genus_epithet
    according to the Catalog of Fishes

    - `country_distribution` character. Three letter code (alpha-3 ISO3166)
    indicating the name of the country where a species is housed

    - `region_distribution` character. The name of the region acording with
    World Bank where a species is housed

    - `bio-dem_data.csv` data frame with data downloaded from
    [Bio-Dem](https://bio-dem.surge.sh/#awards) containing information
    on biological and social information at the country level. Variables:

    - `country` character. A three letter code (alpha-3 ISO3166) representing
    a country

    - `records` numeric. Total number of species occurrence records from Global
    Biodiverity Facility (GBIF)

    - `records_per_area` numeric. Records per area from gbif

    - `yearsSinceIndependence` numeric. Years since independence for each country

    - `e_migdppc` numeric. GDP per capta

    - `museum_data.csv` data frame with museums' acronyms and the world
    region of each. Variables:

    - `code_museum` character. The acronym (three letter code) of the museum

    - `country_museum` character. A three letter code (alpha-3 ISO3166) representing
    a country

    - `region_museum` character. The name of the region acording with
    World Bank

    ### processed

    - `flow_region.csv` a data frame containing flowing of name bearers among world
    regions and the total number of name bearers derived from the source region

    - `flow_period_region.csv` a data frame with the number of name bearers between
    the world regions per 50-year time frame and the total number of name bearers
    in each time frame for each world region

    - `flow_period_region_prop.csv` a data frame with the number of name bearers,
    the Domestic Contribution and Domestic Retention between the world
    regions in a 50-year time frame - this is not used anymore in downstream analyses

    - `flow_region_prop.csv` data with the total number of species flowing
    between world regions, Domestic Contribution and Domestic Retention - this is no longer used in downstream analyses

    - `flow_country.csv` data frame with flowing information of name bearers among
    countries

    - `df_country_native.csv` data frame with the number of native species
    at the country level

    - `df_country_type.csv` data frame with the number of name bearers at the
    country level

    - `df_all_beta.csv` data frame with values of endemic deficit and non-endemic
    representation at the country level

    ## R

    The letters `D`, `A` and `V` represents scripts for, respectively, data
    processing (D), data analysis (A) and results visualization (V). The
    script sequence to reproduce the workflow is indicated by the numbers at
    the beginning of the name of the script file

    - [`01_D_data_preparation.qmd`](R/01_D_data_preparation.qmd) initial data preparation

    - [`02_A_beta-endemics-countries.qmd`](R/02_A_beta-endemics-countries.qmd) analysis of endemic deficit and non endemic representation. This script is used to calculate `native/endemic deficit` and `non-native/non-endemic representation`

    - [`03_D_data_preparation_models.qmd`](R/03_D_data_preparation_models.qmd) script used to build data frames that will be used in statistical models ([`04_A_model_NBTs.qmd`](R/04_A_model_NBTs.qmd))

    - [`04_A_model_NBTs.qmd`](R/04_A_model_NBTs.qmd) statistical models for the total number of name bearers, endemic deficit and non-endemic representation

    - [`05_V_chord_diagram_Fig1.qmd`](R/05_V_chord_diagram_Fig1.qmd) code used to produce circular flow diagram. This is the Figure 1 of the study

    - [`06_V_world_map_Fig1.qmd`](R/06_V_world_map_Fig1.qmd) code used to produce the world map in the Figure 1 of the main text

    - [08_V_beta_endemics_Fig3.qmd](R/08_V_beta_endemics_Fig3.qmd) code used to build Figure 2 of the main text

    - [`09_V_model_Fig4.qmd`](R/09_V_model_Fig4.qmd) code used to build the Figure 3 of the main text. This is the representation of the results of the models present in the script [04_A_model_NBTs.qmd](R/04_A_model_NBTs.qmd)

    - [`0010_Supplementary_analysis.qmd`](R/0010_Supplementary_analysis.qmd) code to produce all the tables and figures presented in the Supplementary material of this study

    ## output

    ### Figures

    In this folder you will find all figures used in the main text and supplementary material of this study

    `Fig1_flow_circle_plot.png` Figure with circular plots showing the flux of name bearers among regions of the world in a 50-year time window

    `Fig3_turnover_metrics_endemics.png` Cartogram with 3 maps showing the level of endemic deficit
    non-endemic representation and the combination of both metrics in a combined map

    `Fig4_models.png` Figure showing the predictions of the number of name bearers,
    endemic deficit and non-endemic representation for different predictors.
    This is derived from the statistical models

    #### Supp-material

    This folder contains the figures in the Supplementary material

    - `FigS1_native_richness.png` World map with countries coloured according to the number of native species richness according to the Catalog of Fishes

    - `FigS3_turnover_metrics.png` Cartogram with 3 maps showing the level of
    native deficit, non-native representation and the combination of both metrics in a combined map

  15. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  16. Street Network Data- Newyork City

    • kaggle.com
    Updated Feb 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Nandakumar (2023). Street Network Data- Newyork City [Dataset]. https://www.kaggle.com/datasets/aryanandakumar/street-network-data-newyork-city/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arya Nandakumar
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description: This data set contains the street network of New York City, retrieved using Osmnx from OpenStreetMap and converted to a GeoPandas DataFrame. The street network is represented as a series of linestrings that connect nodes representing intersections in the road network. The data set can be used for a variety of purposes, such as urban planning, transportation analysis, and spatial modeling.

    Source: The data set was retrieved using Osmnx, a Python package for downloading and analyzing OpenStreetMap data, and converted to a GeoPandas DataFrame using the osmnx.graph_to_gdfs() function. OpenStreetMap: https://www.openstreetmap.org/#map=4/21.82/82.79

    Date: The data set was retrieved on February 24, 2023, and represents the street network of New York City as of that date.

    Format: Comma-separated values (CSV) file.

    Attributes: The data set includes various attributes for nodes and edges, including geographic coordinates, street names, length, and directionality

    The data retrieved using Osmnx can be used for a variety of purposes, including urban planning, transportation engineering, and spatial analysis.

  17. r

    Using seed morphological traits to predict early performance using...

    • researchdata.edu.au
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gallagher Rachael; Tetu Sasha G.; Mills Charlotte H.; Lieurance Paige; Andres Samantha; Samantha E. Andres; Rachael Gallagher; Paige Elizabeth Lieurance (2024). Using seed morphological traits to predict early performance using pelletized seed enhancement technologies in restoration practice [Dataset]. http://doi.org/10.17605/OSF.IO/5WC4Q
    Explore at:
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    Western Sydney University
    OSF
    Authors
    Gallagher Rachael; Tetu Sasha G.; Mills Charlotte H.; Lieurance Paige; Andres Samantha; Samantha E. Andres; Rachael Gallagher; Paige Elizabeth Lieurance
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Manuscript, data, and code associated with a germination experiment using seed enhancement technologies in New South Wales, Australia.

    Two scripts provided for use in R 1. 'treatment_comparisons.txt' details treatment-wise comparisons of emergence, survival, and average time to emergence between treatments (1) bare seed and (2) pelletised replicates of native species 2. 'trait_script.txt' details comparisons of seed morphological traits as predictors of species performance using pellets

    Three major dataframes provided: Emergence_data.csv - raw emergence data from the experiment seed_traits_no_se.csv - average seed morphological trait information from x-ray images emergence_traits.csv- emergence speed data from species in the experiment

    Three supporting dataframes provided: Amenability.csv - characterised amenability results_bin.csv - dataframe based on treatment models to use in plotting results pairwise_letters.csv - dataframe based on treatment models to use in plotting results

  18. f

    Expression vs genomics for predicting dependencies

    • figshare.com
    hdf
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Broad DepMap (2024). Expression vs genomics for predicting dependencies [Dataset]. http://doi.org/10.6084/m9.figshare.25843450.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 17, 2024
    Dataset provided by
    figshare
    Authors
    Broad DepMap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports the "Gene expression has more power for predicting in vitro cancer cell vulnerabilities than genomics" preprint by Dempster et al. To generate the figure panels seen in the preprint using these data, use FigurePanelGeneration.ipynb. This study includes five datasets (citations and details in manuscript).Achilles: the Broad Institute's DepMap public 19Q4 CRISPR knockout screens processed with CERESScore: The Sanger Wellcome Institute's Project Score CRISPR knockout screens processed with CERESRNAi: The DEMETER2-processed combined dataset which includes RNAi data from Achilles, DRIVE, and Marcotte breast screens.PRISM: The PRISM pooled in vitro repurposing primary screen of compoundsGDSC17: Cancer drug in vitro drug screens performed by SangerThe files of most interest to a biologist are Summary.csv. If you are interested in trying machine learning, the files Features.hdf5 and Target.hdf5 contain the data munged in a convenient form for standard supervised machine learning algorithms.Some large files are in the binary format hdf5 for efficiency in space and read-in. These files each contain three named hdf5 datasets. "dim_0" holds the row/index names as an array of strings, "dim_1" holds the column names as an array of strings, and "data" holds the matrix contents as a 2D array of floats. In python, these files can be read in with: import pandas as pd import h5py def read_hdf5(filename): src = h5py.File(filename, 'r') try: dim_0 = [x.decode('utf8') for x in src['dim_0']] dim_1 = [x.decode('utf8') for x in src['dim_1']] data = np.array(src['data']) return pd.DataFrame(index=dim_0, columns=dim_1, data=data) finally: src.close()##################################################################Files (not every dataset will have every type of file listed below):##################################################################AllFeaturePredictions.hdf5: Matrix of cell lines by perturbations, with values indicating the predicted viability using a model with all feature types.ENAdditionScore.csv: A matrix of perturbations by number of features. Values indicate an elastic net model performance (Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5) using only the top X features, where X is the column header.FeatureDropScore.csv: Perturbations and predictive performance for a model using all single gene expression features EXCEPT those that had greater than 0.1 feature importance in a model trained with all single gene expression features. Features.hdf5: A very large matrix of all cell lines by all used CCLE cell features. Continuous features were zscored. Cell lines missing mutation or expression data were dropped. Remaining NA values were imputed to zero. Features types are indicated by the column matrix suffixes: _Exp: expression _Hot: hotspot mutation _Dam: damaging mutation _OtherMut: other mutation _CN: copy number _GSEA: ssGSEA score for an MSigDB gene set _MethTSS: Methylation of transcription start sites _MethCpG: Methylation of CpG islands _Fusion: Gene fusions _Cell: cell tissue propertiesNormLRT.csv: the normLRT score for the given perturbationRFAdditionScore.csv: similar to ENAdditionScore, but using a random forest model.Summary.csv: A dataframe containing predictive model results. Columns: model: Specifies the collection of features used (Expression, Mutation, Exp+CN, etc) gene: The perturbation (column in Target.hdf5) examined. Actually a compound for the PRISM and GDSC17 datasets. overall_pearson: Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5 feature: the Nth most important feature, found by retraining the model with all cell lines (N = 0-9) feature_importance: the feature importance as assessed by sklearn's RandomForestRegressorTarget.hdf5: A matrix of cell lines by perturbations, with entries indicating post-perturbation viability scores. Note that the scales of the viability effects are different for different datasets. See manuscript methods for details.PerturbationInfo.csv: Additional drug annotations for the PRISM and GDSC17 datasetsApproximateCFE.hdf5: A set of Cancer Functional Event cell features based on CCLE data, adapted from Iorio et al. 2016 (10.1016/j.cell.2016.06.017)DepMapSampleInfo.csv: sample info from DepMap_public_19Q4 data, reproduced here as a convenience.GeneRelationships.csv: A list of genes and their related (partner) genes, with the type of relationship (self, protein-protein interaction, CORUM complex membership, paralog). OncoKB_oncogenes.csv: A list of genes that have non-expression-based alterations listed as likely oncogenic or oncogenic by OncoKB as of 9 May 2018.

  19. m

    Philaenus spumarius and other spittlebugs in Trentino. Italy

    • data.mendeley.com
    Updated Sep 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabina Avosani (2021). Philaenus spumarius and other spittlebugs in Trentino. Italy [Dataset]. http://doi.org/10.17632/7rv4czkykr.2
    Explore at:
    Dataset updated
    Sep 21, 2021
    Authors
    Sabina Avosani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy
    Description

    This dataset is associated with the article "Occupancy and detection of agricultural threats: the case of Philaenus spumarius, European vector of Xylella fastidiosa" by the same authors published in JOURNAL 2021 . The data about Philaenus spumarius and other co-occurring species were collected in Trentino, Italy, during the spring and summer 2018 in olive orchards and vineyards. Here are provided the raw data, some preprocessed data and the R codes that we used for the analysis presented in the publication. Please refer to the above mentioned article for more details.

    List of files:

    samplings.xlsx original dataset of field sampling (Sheet: survey), site coordinates and info (sheet: info site) and metadata (sheet: legenda) counts_per_site.csv occupancy abundance dataframe for p. spumarius philaenus_occupancy_data.csv occupancy presence dataframe for p. spumarius sites.cov.csv site covariates for occupancy model observation.cov.csv observation covariates for occupancy mode Rcode.zip commented code and data in R format to run occupancy models for P. Spumarius

  20. Data files for: The effect of building ability and object availability on...

    • figshare.com
    txt
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Menno van Berkel (2023). Data files for: The effect of building ability and object availability on the construction of bower courts in great bowerbirds [Dataset]. http://doi.org/10.6084/m9.figshare.24175197.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Menno van Berkel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    data.csv: The main data frame used for the primary analysis in long format data_combined.csv: The main data frame used and formatted for the primary analysisdata_spatial.csv: The data frame used for the spatial autocorrelationREADME file: columns of data frames explained

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
Organization logo

Learn Data Science Series Part 1

This module contains learning material to master Pandas

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rupesh Kumar
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

  • Chapter 1: Getting started with pandas
  • Chapter 2: Analysis: Bringing it all together and making decisions
  • Chapter 3: Appending to DataFrame
  • Chapter 4: Boolean indexing of dataframes
  • Chapter 5: Categorical data
  • Chapter 6: Computational Tools
  • Chapter 7: Creating DataFrames
  • Chapter 8: Cross sections of different axes with MultiIndex
  • Chapter 9: Data Types
  • Chapter 10: Dealing with categorical variables
  • Chapter 11: Duplicated data
  • Chapter 12: Getting information about DataFrames
  • Chapter 13: Gotchas of pandas
  • Chapter 14: Graphs and Visualizations
  • Chapter 15: Grouping Data
  • Chapter 16: Grouping Time Series Data
  • Chapter 17: Holiday Calendars
  • Chapter 18: Indexing and selecting data
  • Chapter 19: IO for Google BigQuery
  • Chapter 20: JSON
  • Chapter 21: Making Pandas Play Nice With Native Python Datatypes
  • Chapter 22: Map Values
  • Chapter 23: Merge, join, and concatenate
  • Chapter 24: Meta: Documentation Guidelines
  • Chapter 25: Missing Data
  • Chapter 26: MultiIndex
  • Chapter 27: Pandas Datareader
  • Chapter 28: Pandas IO tools (reading and saving data sets)
  • Chapter 29: pd.DataFrame.apply
  • Chapter 30: Read MySQL to DataFrame
  • Chapter 31: Read SQL Server to Dataframe
  • Chapter 32: Reading files into pandas DataFrame
  • Chapter 33: Resampling
  • Chapter 34: Reshaping and pivoting
  • Chapter 35: Save pandas dataframe to a csv file
  • Chapter 36: Series
  • Chapter 37: Shifting and Lagging Data
  • Chapter 38: Simple manipulation of DataFrames
  • Chapter 39: String manipulation
  • Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame
  • Chapter 41: Working with Time Series
Search
Clear search
Close search
Google apps
Main menu