64 datasets found

Learn Data Science Series Part 1
kaggle.com
Updated Dec 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rupesh Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

Chapter 1: Getting started with pandas

Chapter 2: Analysis: Bringing it all together and making decisions

Chapter 3: Appending to DataFrame

Chapter 4: Boolean indexing of dataframes

Chapter 5: Categorical data

Chapter 6: Computational Tools

Chapter 7: Creating DataFrames

Chapter 8: Cross sections of different axes with MultiIndex

Chapter 9: Data Types

Chapter 10: Dealing with categorical variables

Chapter 11: Duplicated data

Chapter 12: Getting information about DataFrames

Chapter 13: Gotchas of pandas

Chapter 14: Graphs and Visualizations

Chapter 15: Grouping Data

Chapter 16: Grouping Time Series Data

Chapter 17: Holiday Calendars

Chapter 18: Indexing and selecting data

Chapter 19: IO for Google BigQuery

Chapter 20: JSON

Chapter 21: Making Pandas Play Nice With Native Python Datatypes

Chapter 22: Map Values

Chapter 23: Merge, join, and concatenate

Chapter 24: Meta: Documentation Guidelines

Chapter 25: Missing Data

Chapter 26: MultiIndex

Chapter 27: Pandas Datareader

Chapter 28: Pandas IO tools (reading and saving data sets)

Chapter 29: pd.DataFrame.apply

Chapter 30: Read MySQL to DataFrame

Chapter 31: Read SQL Server to Dataframe

Chapter 32: Reading files into pandas DataFrame

Chapter 33: Resampling

Chapter 34: Reshaping and pivoting

Chapter 35: Save pandas dataframe to a csv file

Chapter 36: Series

Chapter 37: Shifting and Lagging Data

Chapter 38: Simple manipulation of DataFrames

Chapter 39: String manipulation

Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame

Chapter 41: Working with Time Series
inDecay Training data : processed dataframe + indelgen
figshare.com
txt
Updated Feb 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wergillius Zheng (2024). inDecay Training data : processed dataframe + indelgen [Dataset]. http://doi.org/10.6084/m9.figshare.25133564.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25133564.v2
Dataset updated
Feb 4, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Wergillius Zheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The training data for reimplementing inDecay and FORECasT.The fasta file records the guide RNA, strand, cut-site, and target sequence matched by OligoID.The indelgen folder contains the indelgen file for each OligoID. Each indelgen file records all possible indel events estimated based on the target sequences.Finally, there are five processed dataframe (really big csv). This dataframe contains all the observed events and event frequency.
Shopping Mall
kaggle.com
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anshul Pachauri
Description
Libraries Import:

Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").
Dataset for: Infectious disease responses to human climate change...
zenodo.org
csv
Updated Aug 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgia Titcomb; Georgia Titcomb; Johnny Uelmen; Johnny Uelmen; Mark Janko; Mark Janko; Charles Nunn; Charles Nunn (2024). Dataset for: Infectious disease responses to human climate change adaptations [Dataset]. http://doi.org/10.5281/zenodo.13314361
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13314361
Dataset updated
Aug 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georgia Titcomb; Georgia Titcomb; Johnny Uelmen; Johnny Uelmen; Mark Janko; Mark Janko; Charles Nunn; Charles Nunn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Measurement technique
<div> This dataset includes original data sources and data that have been extracted from other sources that are referenced in the manuscript entitled "Infectious disease responses to human climate change adaptations". Original data: Table_1_source_papers We conducted a Web of Science search following PRISMA guidelines (SI I). Search terms included each topic, followed by “AND (infectious disease* OR zoono* OR pathogen* OR parasit*) AND (human OR people).” Papers were assessed for any positive, negative, or neutral link between each topic (dam construction, crop shifts, rainwater harvesting, mining, migration, carbon sequestration, and public transit) and human infectious diseases. Searches on poultry and transit returned >5,000 papers, so searches were restricted to review topics only. We further restricted the 3479 results for livestock shifts to those with ‘shift’ in the abstract. Following screening of 3485 papers (6964 including all livestock), 108 papers met initial review criteria of being relevant to each adaptation or mitigation and discussing a human infectious disease; of which only 14 were quantitative studies with a control or reference group. Extracted data: <ul> <li>change_livestock_country <ul> <li>Data were extracted from Ogutu 2016 supplementary materials and include percent change calculations for different livestock in different Kenyan counties.</li> <li>Original data source citation: Ogutu, J. O., Piepho, H.-P., Said, M. Y., Ojwang, G. O., Njino, L. W., Kifugo, S. C., & Wargute, P. W. (2016). Extreme wildlife declines and concurrent increase in livestock numbers in Kenya: What are the causes? PloS ONE, 11(9), e0163249. https://doi.org/10.1371/journal.pone.0163249 </li> </ul> </li> <li>country_avg_schist_wormy_world <ul> <li>Schistosomiasis survey data were obtained from the Global Atlas of Helminth Infection and were generated by downloading map data in csv format. Prevalence values were calculated by taking the mean maximum prevalence.</li> <li>Original data source citation: London Applied & Spatial Epidemiology Research Group (LASER). (2023). Global Atlas of Helminth Infections: STH and Schistosomiasis [dataset]. London School of Hygiene and Tropical Medicine. https://lshtm.maps.arcgis.com/apps/webappviewer/index.html?id=2e1bc70731114537a8504e3260b6fbc0 </li> </ul> </li> <li>kenya_precip_change_1951_2020 <ul> <li>Data were extracted from the Climate Change Knowledge Portal and downloaded in csv format.</li> <li>Original data source citation: World Bank Group. (2023). Climate Data & Projections—Kenya. Climate Change Knowledge Portal. https://climateknowledgeportal.worldbank.org/country/kenya/climate-data-projections </li> </ul> </li> </ul> </div>
Description
Original and derived data products referenced in the original manuscript are provided in the data package.

Description of the data and file structure

Original data:

Table_1_source_papers.csv: Papers that met review criteria and which are summarized in Table 1 of the manuscript.

ID: The paper identification number

Topic: The broad topic (i.e., each row of Table 1)

Authors: The names of the authors of the paper

Article Title: The title of the paper

Source Title: The name of the journal in which the paper was published

Abstract: The paper's abstract, retrieved from the Web of Science search

study_type: Classification of the study methodology/approach. "A" = a designed study that shows effect ,"B" = a pre/post study, "C" = a comparison of health outcomes or pathogen risk relative to a 'control/comparison' area, "D" = some quantitative effect but no control, "E" = qualitative comments but little supporting evidence, and/or a qualitative review.

pathogen_broad: Broad classification of the type of pathogen discussed in the paper.

transmission_type: Categorization of indirect, direct, sexual, vector, or other transmission modes.

pathogen_type: Categorization of bacteria, helminth, virus, protozoa, fungi, or other pathogen types.

country: Country in which the study was performed or results discussed. When countries were not available, regions were used. NA values indicate papers in which a geographic region was not relevant to the study (i.e., a methods-based study).

Derived data:

change_livestock_country.csv: A dataframe containing values used to generate Figure 4a in the manuscript.

County Name: The name of the county in Kenya

Sheep and goats 1980: The estimated number of sheep and goats in 1980

Sheep and goats 2016: The estimated number of sheep and goats in 2016

pct_change_shoat: The percent change in sheep and goat numbers from 1980 to 2016

Cattle 1980: The estimated number of cattle in 1980

Cattle 2016: The estimated number of cattle in 2016

pct_change_cattle: The percent change in cattle numbers from 1980 to 2016

Camel 1980: The estimated number of camels in 1980

Camel 2016: The estimated number of camels in 2016

pct_change_camel: The percent change in camel numbers from 1980 to 2016

human_pop 1980: The estimated human population in the county in 1980

human_pop 2016: The estimated human population in the county in 1980

pct_change_human: The percent change in the human population from 1980 to 2016

area_sq_km: The land area of the county

change_ind_per_sq_km_shoat: Absolute change in number of sheep and goats from 1980 to 2016

change_ind_per_sq_km_cattle: Absolute change in number of cattle from 1980 to 2016

change_ind_per_sq_km_camel: Absolute change in number of camels from 1980 to 2016

country_avg_schist_wormy_world.csv: A dataframe containing values used to generate Figure 3 in the manuscript.

Country: The country in which the schistosome prevalence studies were performed.

Latitude: The latitute in decimal degrees

Longitude: The longitute in decimal degrees

Maximum.prevalence: The mean maximum schistosomiasis prevalence of studies conducted within each country.

kenya_precip_change_1951_2020.csv: A dataframe containing values used to generate Figure 4b in the manuscript.

Precipitation (mm): Binned annual precipitation values

1951-1980: The density of observations for each annual precipitation value for the 1951-1980 period

1971-2000: The density of observations for each annual precipitation value for the 1971-2000 period

1991-2020: The density of observations for each annual precipitation value for the 1991-2020 period

Sharing/Access information

Data were derived from the following sources:

Ogutu, J. O., Piepho, H.-P., Said, M. Y., Ojwang, G. O., Njino, L. W., Kifugo, S. C., & Wargute, P. W. (2016). Extreme wildlife declines and concurrent increase in livestock numbers in Kenya: What are the causes? PloS ONE, 11(9), e0163249. https://doi.org/10.1371/journal.pone.0163249

London Applied & Spatial Epidemiology Research Group (LASER). (2023). Global Atlas of Helminth Infections: STH and Schistosomiasis [dataset]. London School of Hygiene and Tropical Medicine. https://lshtm.maps.arcgis.com/apps/webappviewer/index.html?id=2e1bc70731114537a8504e3260b6fbc0

World Bank Group. (2023). Climate Data & Projections—Kenya. Climate Change Knowledge Portal. https://climateknowledgeportal.worldbank.org/country/kenya/climate-data-projections
u
Data from: dblp XML dataset as CSV for Python Data Analysis Library
observatorio-cientifico.ua.es
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo (2021). dblp XML dataset as CSV for Python Data Analysis Library [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc45db9e7c03b01bdb2d0
Explore at:
Dataset updated
2021
Authors
Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo
Description
Based on the dblp XML file, this dataset consists on a CSV file that has been extracted using a python script. The dataset can be easily loaded in a Python Data Analysis Library dataframe.
Z
Longitudinal corpus of privacy policies
data.niaid.nih.gov
zenodo.org
Updated Dec 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wagner, Isabel (2022). Longitudinal corpus of privacy policies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5841138
Explore at:
Dataset updated
Dec 12, 2022
Dataset authored and provided by
Wagner, Isabel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a corpus of 56,416 unique privacy policy texts spanning the years 1996-2021.

policy-texts.zip contains a directory of text files with the policy texts. File names are the hashes of the policy text.

policy-metadata.zip contains two CSV files (can be imported into a pandas dataframe) with policy metadata including readability measures for each policy text.

labeled-policies.zip contains CSV files with content labels for each policy. Labeling was done using a BERT classifier.

Details on the methodology can be found in the accompanying paper.
Z
HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities
data.niaid.nih.gov
zenodo.org
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dutton, Rachel J (2023). HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7613702
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Dutton, Rachel J
Weiss, Emily CP
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is associated with this HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities pub from Arcadia Science.

HiPR-FISH spatial imaging was used to look at the distribution of microbes within five distinct microbial communities growing on the surface of aged cheeses. Probe design and imaging was performed by Kanvas Biosciences.

This dataset includes the following:

For each field of view (roughly 135µm x 135µm; 7 FOVs per each cheese specimen):

A fluorescence intensity image (*_spectral_max_projection.png/.tif).

A pseudo-colored microbe-labeled image (*_identification.png/.tif).

A data frame contains each identified microbe's identity, position, and size (*_cell_information.csv).

A segmented mask for microbiota (*_segmentation.png/.tif)

A spatial proximity graph for each species close to each other, showing the spatial enrichment over random distribution (*_spatialheatmap.png).

A corresponding data frame used to generate the spatial proximity graph (_absolute_spatial_association.csv) and dataframe for the average of 500 random shuffles of the taxa (_randomized_spatial_association_matrix.csv).

For each cheese specimen:

A widefield image with FOVs located on the image (*_WF_overlay.png).

In general:

A png showing the color legend for each species. (ARC1_taxa_color_legend.png)

A data frame showing the environmental location of each FOV in the cheese (RIND/CURD) and the location of each FOV relative to FOV 1. (ARC1_Cheese_Map.csv).

A vignette showing an example of each cell and its false coloring according to its taxonomic identification (ARC1_detected_species_representative_cell_vignette.png).

Sequences used as input in probe design (16S_18S_forKanvas.fasta).

A CSV file containing the sequences that belong to each ASV (ARC1_sequences_to_ASVs.csv).

Plots of log-transformed counts for each microbe detected across all FOVs, and broken down for each cheese (*detected_species_absolute_abundance.png).

CSVs containing pairwise correlation of FOVs based on spatial association (ARC1_spatial_association_FOV_correlation.csv) and microbial abundance (ARC1_abundance_FOV_correlation.csv).

Plots of spatial association matrices, aggregated for different cheeses and different locations (RIND vs CURD) (*samples_*loc_relative_spatial_association.png).

CSV containing the principle component coordinates for each FOV (ARC1_abundance_FOV_PCA.csv, ARC1_spatial_association_FOV_PCA.csv).

CSV containing the mean fold-change in number of edges between each ASV and the corresponding p-value when compared to the null state (random spatial association matrices) (ARC1_spatial_enrichment_significance.csv).
h
oldIT2modIT
huggingface.co
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massimo Romano (2025). oldIT2modIT [Dataset]. https://huggingface.co/datasets/cybernetic-m/oldIT2modIT
Explore at:
Dataset updated
Jun 3, 2025
Authors
Massimo Romano
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Download the dataset

At the moment to download the dataset you should use Pandas DataFrame: import pandas as pd df = pd.read_csv("https://huggingface.co/datasets/cybernetic-m/oldIT2modIT/resolve/main/oldIT2modIT_dataset.csv")

You can visualize the dataset with: df.head()

To convert into Huggingface dataset: from datasets import Dataset dataset = Dataset.from_pandas(df)

Dataset Description

This is an italian dataset formed by 200 old (ancient) italian sentence and… See the full description on the dataset page: https://huggingface.co/datasets/cybernetic-m/oldIT2modIT.
d
Dataset for: Cattle aggregations at shared resources create potential...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgia Titcomb; Jenna Hulke; John Naisikie Mantas; Benard Gituku; Hillary Young (2023). Dataset for: Cattle aggregations at shared resources create potential parasite exposure hotspots for wildlife [Dataset]. http://doi.org/10.5061/dryad.vdncjsz28
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.vdncjsz28
Dataset updated
Nov 29, 2023
Dataset provided by
Dryad Digital Repository
Authors
Georgia Titcomb; Jenna Hulke; John Naisikie Mantas; Benard Gituku; Hillary Young
Time period covered
Jan 1, 2023
Description
Globally rising livestock populations and declining wildlife numbers are likely to dramatically change disease risk for wildlife and livestock, especially at resources where they congregate. However, limited understanding of interspecific transmission dynamics at these hotspots hinders disease prediction or mitigation. In this study, we combined gastrointestinal nematode density and host foraging activity measurements from our prior work in this system with three estimates of parasite-sharing capacity to investigate how interspecific exposures alter the relative riskiness of an important resource â€“ water â€“ among cattle and five dominant herbivore species in an East African tropical savanna.Â We found that due to their high parasite output, water dependence, and parasite-sharing capacity, cattle greatly increased potential parasite exposures at water sources for wild ruminants. When untreated for parasites, cattle accounted for over two-thirds of total potential exposures around water fo..., , , # Dataset for Cattle aggregations at shared resources create potential parasite exposure hotspots for wildlife

https://doi.org/10.5061/dryad.vdncjsz28

These data accompany the publication "Cattle aggregations at shared resources create potential parasite exposure hotspots for wildlife" in Proceedings of the Royal Society B: Biological Sciences (doi: 10.1098/rspb.2023-2239).

Description of the data and file structure

The data include three data files and code to replicate results of the publication. Specifically, the data files are:

1) fec_data.csv: A dataframe containing parasite fecal egg count values for focal species in the study.

2) parasite_risk_at_water.csv: A dataframe containing information on parasite exposure estimates for different species over the study period. Columns are as follows:

Period: The dung survey period (numbered based on number of months elapsed).

Date: The initial survey date.

...
d
National Water Model RouteLinks CSV
dataone.org
hydroshare.org
+2more
Updated Apr 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason A Regina; Austin Raney (2022). National Water Model RouteLinks CSV [Dataset]. http://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
Explore at:
Unique identifier
https://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
Dataset updated
Apr 15, 2022
Dataset provided by
Hydroshare
Authors
Jason A Regina; Austin Raney
Time period covered
Apr 12, 2019 - Oct 14, 2021
Area covered

Description
This resource contains "RouteLink" files for version 2.1.6 of the National Water Model which are used to associate feature identifiers for computational reaches to relevant metadata. These data are important for comparing NWM feature data to USGS streamflow and lake observations. The original RouteLink files are in NetCDF format and available here: https://www.nco.ncep.noaa.gov/pmb/codes/nwprod

This resource includes the files in a human-friendlier CSV format for easier use, and a machine-friendlier file in HDF5 format which contains a single pandas.DataFrame. The scripts and supporting utilities are also included for users that wish to rebuild these files. Source code is hosted here: https://github.com/jarq6c/NWM_RouteLinks

Speed profiles of freeways in California (I5-S and I210-E)

zenodo.org

csv

Updated Jan 24, 2020

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Semin Kwak; Semin Kwak (2020). Speed profiles of freeways in California (I5-S and I210-E) [Dataset]. http://doi.org/10.5281/zenodo.3478594

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3478594

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Semin Kwak; Semin Kwak

License

Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically

Area covered

California

Description

Speed profiles of freeways in California (I5-S and I210-E). Original data is retrieved from PeMS.

Each YEAR_FREEWAY.csv file contains Timestamp and Speed data.

freeway_meta.csv file contains meta information for each detector: freeway number, direction, detector ID, absolute milepost, and x y coordinates.

# Freeway speed data description

### Data loading example (single freeway: I5-S 2012)


```python
%%time
import pandas as pd

# Date time parser
mydateparser = lambda x: pd.datetime.strptime(x, "%m/%d/%Y %H:%M:%S")

# Freeway data loading (This part should be changed to a proper URL in zenodo.org)
data = pd.read_csv("dataset/2012_I5S.csv", 
          parse_dates=["Timestamp"],
          date_parser=mydateparser).pivot(index="Timestamp",columns='Station_ID', values='Speed')


# Meta data loading
meta = pd.read_csv("dataset/freeway_meta.csv").set_index(['Fwy','Dir'])
```

  CPU times: user 50.5 s, sys: 911 ms, total: 51.4 s
  Wall time: 50.9 s


### Speed data and meta data


```python
data.head()
```







 
  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
 
 
  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
  
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
  
 
Station_ID 1 2 3 4 5 6 7 8 9 10 ... 80 81 82 83 84 85 86 87 88 89
Timestamp 
2012-01-01 06:00:00 70.0 69.8 70.1 69.6 69.9 70.8 70.1 69.3 69.2 68.2 ... 72.1 67.6 71.0 66.8 65.9 58.2 67.1 63.8 67.1 71.6
2012-01-01 06:05:00 69.2 69.8 69.8 69.4 69.5 69.5 68.3 67.5 67.4 67.2 ... 71.5 66.1 69.5 67.4 68.3 59.0 66.9 60.8 66.6 65.7
2012-01-01 06:10:00 69.2 69.0 68.6 68.7 68.6 68.9 61.7 68.3 67.4 67.7 ... 71.1 65.2 71.2 66.5 65.4 59.6 66.3 58.4 68.2 65.6
2012-01-01 06:15:00 69.9 69.6 69.7 69.2 69.0 69.1 65.3 67.6 67.1 66.8 ... 69.9 67.1 69.3 66.9 68.2 60.6 66.0 55.5 67.1 69.7
2012-01-01 06:20:00 68.7 68.4 68.2 67.9 68.3 69.3 67.0 68.4 68.2 68.2 ... 70.9 67.2 69.9 65.6 66.7 62.8 66.2 62.6 67.2 67.5
5 rows × 89 columns





```python
meta.head()
```







 
  
   
   
   
   
   
   
  
  
   
   
   
   
   
   
  
 
 
  
   
   
   
   
   
   
  
  
   
   
   
   
   
  
  
   
   
   
   
   
  
  
   
   
   
   
   
  
  
   
   
   
   
   
  
 
ID Abs_mp Latitude Longitude
Fwy Dir 
5 S 1 0.058 32.542731 -117.030501
S 2 0.146 32.543587 -117.031769
S 3 1.291 32.552409 -117.048120
S 4 2.222 32.558422 -117.062360
S 5 2.559 32.561106 -117.067228




### Choose a day


```python
# Sampling (2012-01-13)
myday = "2012-01-13"

# Filter the data by the day
myday_speed_data = data.loc[myday]
```

### A speed profile


```python
from matplotlib import pyplot as plt
import matplotlib.dates as mdates

# Axis value setting
mp = meta[meta.ID.isin(data.columns)].Abs_mp
hour = myday_speed_data.index

# Draw the day
fig, ax = plt.subplots()
heatmap = ax.pcolormesh(hour,mp,myday_speed_data.T, cmap=plt.cm.RdYlGn, vmin=0, vmax=80, alpha=1)
plt.colorbar(heatmap, ax=ax)

# Appearance setting
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H"))
plt.title(pd.Timestamp(myday).strftime("%Y-%m-%d [%a]"))
plt.xlabel("hour")
plt.ylabel("milepost")
plt.show()
```


![png](output_9_0.png)

Z
GENEActiv accelerometer file related to the #120 OxWearables / stepcount...
data.niaid.nih.gov
zenodo.org
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wattelez, Guillaume (2024). GENEActiv accelerometer file related to the #120 OxWearables / stepcount issue [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11557420
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Wattelez, Guillaume
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An example of .bin file that have an IndexError when processing.

Consider #120 OxWearables / stepcount issue for more details.

The .csv files are 1-second epoch conversions from the .bin file and contain time, x, y, z columns. The conversion was done by:

reading the .bin with the GENEAread R package.

keeping only the time, x, y and z columns.

saving the data.frame into a .csv file.

The only difference between the .csv files is the column format used for the time column before saving:

time column in XXXXXX_....csv had a string class

time column in XXXXXT....csv had a "POSIXct" "POSIXt" class
Extracted data
figshare.com
txt
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofie S. Kristensen; Henrik Jörntell (2024). Extracted data [Dataset]. http://doi.org/10.6084/m9.figshare.26180284.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26180284.v1
Dataset updated
Jul 4, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Sofie S. Kristensen; Henrik Jörntell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For each recorded neuron is:All spike onset times (NAME_spikeTimes.csv)All LFP-SPW onset times (NAME_lfpTimes.csv)All ECoG-SPW onset times (NAME_eegTimes.csv)Dataframe with stimulation onset times and descriptive statistics of LFP-SPWs and ECoG-SPWs preceding stimulations (NAME_df_new.csv)
Data and code for the manuscript - The hidden biodiversity knowledge split...
zenodo.org
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous Anonymous; Anonymous Anonymous (2025). Data and code for the manuscript - The hidden biodiversity knowledge split in biological collections [Dataset]. http://doi.org/10.5281/zenodo.15248066
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15248066
Dataset updated
Apr 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous Anonymous; Anonymous Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 19, 2025
Description
# General overview

This repository contains the data and code used in the analysis of the
manuscript entitled **"The hidden biodiversity knowledge split in biological collections"**.

# Context

Ecological and evolutionary processes generate biodiversity, yet how biodiversity data are organized and shared globally can shape our understanding of these processes. We show that name-bearing type specimens—the primary reference for species identity—of all freshwater and brackish fish species are predominantly housed in Global North museums, disconnected from their countries of origin. This geographical divide creates a ‘knowledge split’ with consequences for biodiversity science, particularly in the Global South, where researchers face barriers in studying native species’ name bearers housed abroad. Meanwhile, Global North collections remain flooded with non-native name bearers. We relate this imbalance to historical and socioeconomic factors, which ultimately restricts access to critical taxonomic reference materials and hinders global species documentation. To address this disparity, we call for international initiatives to promote fairer access to biological knowledge, including specimen repatriation, improved accessibility protocols for researchers in countries where specimens originated, and inclusive research partnerships.

# Repository structure

## data

This folder stores raw and processed data used to perform all the
analysis presented in this study

### raw

- `flow_period_region_country.csv` a data frame in the long format
containing the flowing of NBT per regions per per time (50-year time
frame). Variables:

- `period` numeric variable representing 50-year time intervals

- `region_type` character representing the name of the World Bank region
of the country where the NBT was sourced

- `country_type` character. A three letter code (alpha-3 ISO3166) representing
the country of the museum where the NBT was sourced

- `region_museum` character. Name of the World Bank region of the country
where the NBT is housed

- `country_museum` character. A three letter code (alpha-3 ISO3166) representing
the country of the museum where the NBT is housed

- `n` numeric. The number of NBT flowing from one country to another

- `spp_native_distribution.csv` data frame in the long format
containing the native composition at the country level. Variables:

- `valid_name` character. The name of a species in the format genus_epithet
according to the Catalog of Fishes

- `country_distribution` character. Three letter code (alpha-3 ISO3166)
indicating the name of the country where a species is native to

- `region_distribution` character. The name of the region acording with
World Bank where a species is native to

- `spp_type_distribution.csv` data frame in the long format containing
the composition of NBT by country. Variables:

- `valid_name` character. The name of a species in the format genus_epithet
according to the Catalog of Fishes

- `country_distribution` character. Three letter code (alpha-3 ISO3166)
indicating the name of the country where a species is housed

- `region_distribution` character. The name of the region acording with
World Bank where a species is housed

- `bio-dem_data.csv` data frame with data downloaded from
[Bio-Dem](https://bio-dem.surge.sh/#awards) containing information
on biological and social information at the country level. Variables:

- `country` character. A three letter code (alpha-3 ISO3166) representing
a country

- `records` numeric. Total number of species occurrence records from Global
Biodiverity Facility (GBIF)

- `records_per_area` numeric. Records per area from gbif

- `yearsSinceIndependence` numeric. Years since independence for each country

- `e_migdppc` numeric. GDP per capta

- `museum_data.csv` data frame with museums' acronyms and the world
region of each. Variables:

- `code_museum` character. The acronym (three letter code) of the museum

- `country_museum` character. A three letter code (alpha-3 ISO3166) representing
a country

- `region_museum` character. The name of the region acording with
World Bank

### processed

- `flow_region.csv` a data frame containing flowing of name bearers among world
regions and the total number of name bearers derived from the source region

- `flow_period_region.csv` a data frame with the number of name bearers between
the world regions per 50-year time frame and the total number of name bearers
in each time frame for each world region

- `flow_period_region_prop.csv` a data frame with the number of name bearers,
the Domestic Contribution and Domestic Retention between the world
regions in a 50-year time frame - this is not used anymore in downstream analyses

- `flow_region_prop.csv` data with the total number of species flowing
between world regions, Domestic Contribution and Domestic Retention - this is no longer used in downstream analyses

- `flow_country.csv` data frame with flowing information of name bearers among
countries

- `df_country_native.csv` data frame with the number of native species
at the country level

- `df_country_type.csv` data frame with the number of name bearers at the
country level

- `df_all_beta.csv` data frame with values of endemic deficit and non-endemic
representation at the country level

## R

The letters `D`, `A` and `V` represents scripts for, respectively, data
processing (D), data analysis (A) and results visualization (V). The
script sequence to reproduce the workflow is indicated by the numbers at
the beginning of the name of the script file

- [`01_D_data_preparation.qmd`](R/01_D_data_preparation.qmd) initial data preparation

- [`02_A_beta-endemics-countries.qmd`](R/02_A_beta-endemics-countries.qmd) analysis of endemic deficit and non endemic representation. This script is used to calculate `native/endemic deficit` and `non-native/non-endemic representation`

- [`03_D_data_preparation_models.qmd`](R/03_D_data_preparation_models.qmd) script used to build data frames that will be used in statistical models ([`04_A_model_NBTs.qmd`](R/04_A_model_NBTs.qmd))

- [`04_A_model_NBTs.qmd`](R/04_A_model_NBTs.qmd) statistical models for the total number of name bearers, endemic deficit and non-endemic representation

- [`05_V_chord_diagram_Fig1.qmd`](R/05_V_chord_diagram_Fig1.qmd) code used to produce circular flow diagram. This is the Figure 1 of the study

- [`06_V_world_map_Fig1.qmd`](R/06_V_world_map_Fig1.qmd) code used to produce the world map in the Figure 1 of the main text

- [08_V_beta_endemics_Fig3.qmd](R/08_V_beta_endemics_Fig3.qmd) code used to build Figure 2 of the main text

- [`09_V_model_Fig4.qmd`](R/09_V_model_Fig4.qmd) code used to build the Figure 3 of the main text. This is the representation of the results of the models present in the script [04_A_model_NBTs.qmd](R/04_A_model_NBTs.qmd)

- [`0010_Supplementary_analysis.qmd`](R/0010_Supplementary_analysis.qmd) code to produce all the tables and figures presented in the Supplementary material of this study

## output

### Figures

In this folder you will find all figures used in the main text and supplementary material of this study

`Fig1_flow_circle_plot.png` Figure with circular plots showing the flux of name bearers among regions of the world in a 50-year time window

`Fig3_turnover_metrics_endemics.png` Cartogram with 3 maps showing the level of endemic deficit
non-endemic representation and the combination of both metrics in a combined map

`Fig4_models.png` Figure showing the predictions of the number of name bearers,
endemic deficit and non-endemic representation for different predictors.
This is derived from the statistical models

#### Supp-material

This folder contains the figures in the Supplementary material

- `FigS1_native_richness.png` World map with countries coloured according to the number of native species richness according to the Catalog of Fishes

- `FigS3_turnover_metrics.png` Cartogram with 3 maps showing the level of
native deficit, non-native representation and the combination of both metrics in a combined map
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
explore.openaire.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
Street Network Data- Newyork City
kaggle.com
Updated Feb 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arya Nandakumar (2023). Street Network Data- Newyork City [Dataset]. https://www.kaggle.com/datasets/aryanandakumar/street-network-data-newyork-city/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arya Nandakumar
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description: This data set contains the street network of New York City, retrieved using Osmnx from OpenStreetMap and converted to a GeoPandas DataFrame. The street network is represented as a series of linestrings that connect nodes representing intersections in the road network. The data set can be used for a variety of purposes, such as urban planning, transportation analysis, and spatial modeling.

Source: The data set was retrieved using Osmnx, a Python package for downloading and analyzing OpenStreetMap data, and converted to a GeoPandas DataFrame using the osmnx.graph_to_gdfs() function. OpenStreetMap: https://www.openstreetmap.org/#map=4/21.82/82.79

Date: The data set was retrieved on February 24, 2023, and represents the street network of New York City as of that date.

Format: Comma-separated values (CSV) file.

Attributes: The data set includes various attributes for nodes and edges, including geographic coordinates, street names, length, and directionality

The data retrieved using Osmnx can be used for a variety of purposes, including urban planning, transportation engineering, and spatial analysis.
r
Using seed morphological traits to predict early performance using...
researchdata.edu.au
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gallagher Rachael; Tetu Sasha G.; Mills Charlotte H.; Lieurance Paige; Andres Samantha; Samantha E. Andres; Rachael Gallagher; Paige Elizabeth Lieurance (2024). Using seed morphological traits to predict early performance using pelletized seed enhancement technologies in restoration practice [Dataset]. http://doi.org/10.17605/OSF.IO/5WC4Q
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/5WC4Q
Dataset updated
Sep 11, 2024
Dataset provided by
Western Sydney University
OSF
Authors
Gallagher Rachael; Tetu Sasha G.; Mills Charlotte H.; Lieurance Paige; Andres Samantha; Samantha E. Andres; Rachael Gallagher; Paige Elizabeth Lieurance
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Manuscript, data, and code associated with a germination experiment using seed enhancement technologies in New South Wales, Australia.

Two scripts provided for use in R 1. 'treatment_comparisons.txt' details treatment-wise comparisons of emergence, survival, and average time to emergence between treatments (1) bare seed and (2) pelletised replicates of native species 2. 'trait_script.txt' details comparisons of seed morphological traits as predictors of species performance using pellets

Three major dataframes provided: Emergence_data.csv - raw emergence data from the experiment seed_traits_no_se.csv - average seed morphological trait information from x-ray images emergence_traits.csv- emergence speed data from species in the experiment

Three supporting dataframes provided: Amenability.csv - characterised amenability results_bin.csv - dataframe based on treatment models to use in plotting results pairwise_letters.csv - dataframe based on treatment models to use in plotting results
f
Expression vs genomics for predicting dependencies
figshare.com
hdf
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Broad DepMap (2024). Expression vs genomics for predicting dependencies [Dataset]. http://doi.org/10.6084/m9.figshare.25843450.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25843450.v1
Dataset updated
May 17, 2024
Dataset provided by
figshare
Authors
Broad DepMap
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset supports the "Gene expression has more power for predicting in vitro cancer cell vulnerabilities than genomics" preprint by Dempster et al. To generate the figure panels seen in the preprint using these data, use FigurePanelGeneration.ipynb. This study includes five datasets (citations and details in manuscript).Achilles: the Broad Institute's DepMap public 19Q4 CRISPR knockout screens processed with CERESScore: The Sanger Wellcome Institute's Project Score CRISPR knockout screens processed with CERESRNAi: The DEMETER2-processed combined dataset which includes RNAi data from Achilles, DRIVE, and Marcotte breast screens.PRISM: The PRISM pooled in vitro repurposing primary screen of compoundsGDSC17: Cancer drug in vitro drug screens performed by SangerThe files of most interest to a biologist are Summary.csv. If you are interested in trying machine learning, the files Features.hdf5 and Target.hdf5 contain the data munged in a convenient form for standard supervised machine learning algorithms.Some large files are in the binary format hdf5 for efficiency in space and read-in. These files each contain three named hdf5 datasets. "dim_0" holds the row/index names as an array of strings, "dim_1" holds the column names as an array of strings, and "data" holds the matrix contents as a 2D array of floats. In python, these files can be read in with: import pandas as pd import h5py def read_hdf5(filename): src = h5py.File(filename, 'r') try: dim_0 = [x.decode('utf8') for x in src['dim_0']] dim_1 = [x.decode('utf8') for x in src['dim_1']] data = np.array(src['data']) return pd.DataFrame(index=dim_0, columns=dim_1, data=data) finally: src.close()##################################################################Files (not every dataset will have every type of file listed below):##################################################################AllFeaturePredictions.hdf5: Matrix of cell lines by perturbations, with values indicating the predicted viability using a model with all feature types.ENAdditionScore.csv: A matrix of perturbations by number of features. Values indicate an elastic net model performance (Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5) using only the top X features, where X is the column header.FeatureDropScore.csv: Perturbations and predictive performance for a model using all single gene expression features EXCEPT those that had greater than 0.1 feature importance in a model trained with all single gene expression features. Features.hdf5: A very large matrix of all cell lines by all used CCLE cell features. Continuous features were zscored. Cell lines missing mutation or expression data were dropped. Remaining NA values were imputed to zero. Features types are indicated by the column matrix suffixes: _Exp: expression _Hot: hotspot mutation _Dam: damaging mutation _OtherMut: other mutation _CN: copy number _GSEA: ssGSEA score for an MSigDB gene set _MethTSS: Methylation of transcription start sites _MethCpG: Methylation of CpG islands _Fusion: Gene fusions _Cell: cell tissue propertiesNormLRT.csv: the normLRT score for the given perturbationRFAdditionScore.csv: similar to ENAdditionScore, but using a random forest model.Summary.csv: A dataframe containing predictive model results. Columns: model: Specifies the collection of features used (Expression, Mutation, Exp+CN, etc) gene: The perturbation (column in Target.hdf5) examined. Actually a compound for the PRISM and GDSC17 datasets. overall_pearson: Pearson correlation of concatenated out-of-sample predictions with the values given in Target.hdf5 feature: the Nth most important feature, found by retraining the model with all cell lines (N = 0-9) feature_importance: the feature importance as assessed by sklearn's RandomForestRegressorTarget.hdf5: A matrix of cell lines by perturbations, with entries indicating post-perturbation viability scores. Note that the scales of the viability effects are different for different datasets. See manuscript methods for details.PerturbationInfo.csv: Additional drug annotations for the PRISM and GDSC17 datasetsApproximateCFE.hdf5: A set of Cancer Functional Event cell features based on CCLE data, adapted from Iorio et al. 2016 (10.1016/j.cell.2016.06.017)DepMapSampleInfo.csv: sample info from DepMap_public_19Q4 data, reproduced here as a convenience.GeneRelationships.csv: A list of genes and their related (partner) genes, with the type of relationship (self, protein-protein interaction, CORUM complex membership, paralog). OncoKB_oncogenes.csv: A list of genes that have non-expression-based alterations listed as likely oncogenic or oncogenic by OncoKB as of 9 May 2018.
m
Philaenus spumarius and other spittlebugs in Trentino. Italy
data.mendeley.com
Updated Sep 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabina Avosani (2021). Philaenus spumarius and other spittlebugs in Trentino. Italy [Dataset]. http://doi.org/10.17632/7rv4czkykr.2
Explore at:
Unique identifier
https://doi.org/10.17632/7rv4czkykr.2
Dataset updated
Sep 21, 2021
Authors
Sabina Avosani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy
Description
This dataset is associated with the article "Occupancy and detection of agricultural threats: the case of Philaenus spumarius, European vector of Xylella fastidiosa" by the same authors published in JOURNAL 2021 . The data about Philaenus spumarius and other co-occurring species were collected in Trentino, Italy, during the spring and summer 2018 in olive orchards and vineyards. Here are provided the raw data, some preprocessed data and the R codes that we used for the analysis presented in the publication. Please refer to the above mentioned article for more details.

List of files:

samplings.xlsx original dataset of field sampling (Sheet: survey), site coordinates and info (sheet: info site) and metadata (sheet: legenda) counts_per_site.csv occupancy abundance dataframe for p. spumarius philaenus_occupancy_data.csv occupancy presence dataframe for p. spumarius sites.cov.csv site covariates for occupancy model observation.cov.csv observation covariates for occupancy mode Rcode.zip commented code and data in R format to run occupancy models for P. Spumarius
Data files for: The effect of building ability and object availability on...
figshare.com
txt
Updated Sep 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menno van Berkel (2023). Data files for: The effect of building ability and object availability on the construction of bower courts in great bowerbirds [Dataset]. http://doi.org/10.6084/m9.figshare.24175197.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24175197.v2
Dataset updated
Sep 21, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Menno van Berkel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
data.csv: The main data frame used for the primary analysis in long format data_combined.csv: The main data frame used and formatted for the primary analysisdata_spatial.csv: The data frame used for the spatial autocorrelationREADME file: columns of data frames explained

		ID	Abs_mp	Latitude	Longitude
5	S	1	0.058	32.542731	-117.030501
S	2	0.146	32.543587	-117.031769
S	3	1.291	32.552409	-117.048120
S	4	2.222	32.558422	-117.062360
S	5	2.559	32.561106	-117.067228

Facebook

Twitter

Click to copy link

Link copied

Cite

Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1

Learn Data Science Series Part 1

This module contains learning material to master Pandas

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 30, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Rupesh Kumar

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

Chapter 1: Getting started with pandas
Chapter 2: Analysis: Bringing it all together and making decisions
Chapter 3: Appending to DataFrame
Chapter 4: Boolean indexing of dataframes
Chapter 5: Categorical data
Chapter 6: Computational Tools
Chapter 7: Creating DataFrames
Chapter 8: Cross sections of different axes with MultiIndex
Chapter 9: Data Types
Chapter 10: Dealing with categorical variables
Chapter 11: Duplicated data
Chapter 12: Getting information about DataFrames
Chapter 13: Gotchas of pandas
Chapter 14: Graphs and Visualizations
Chapter 15: Grouping Data
Chapter 16: Grouping Time Series Data
Chapter 17: Holiday Calendars
Chapter 18: Indexing and selecting data
Chapter 19: IO for Google BigQuery
Chapter 20: JSON
Chapter 21: Making Pandas Play Nice With Native Python Datatypes
Chapter 22: Map Values
Chapter 23: Merge, join, and concatenate
Chapter 24: Meta: Documentation Guidelines
Chapter 25: Missing Data
Chapter 26: MultiIndex
Chapter 27: Pandas Datareader
Chapter 28: Pandas IO tools (reading and saving data sets)
Chapter 29: pd.DataFrame.apply
Chapter 30: Read MySQL to DataFrame
Chapter 31: Read SQL Server to Dataframe
Chapter 32: Reading files into pandas DataFrame
Chapter 33: Resampling
Chapter 34: Reshaping and pivoting
Chapter 35: Save pandas dataframe to a csv file
Chapter 36: Series
Chapter 37: Shifting and Lagging Data
Chapter 38: Simple manipulation of DataFrames
Chapter 39: String manipulation
Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame
Chapter 41: Working with Time Series

Clear search

Close search

Google apps

Main menu

Station_ID	1	2	3	4	5	6	7	8	9	10	...	80	81	82	83	84	85	86	87	88	89
Timestamp
2012-01-01 06:00:00	70.0	69.8	70.1	69.6	69.9	70.8	70.1	69.3	69.2	68.2	...	72.1	67.6	71.0	66.8	65.9	58.2	67.1	63.8	67.1	71.6
2012-01-01 06:05:00	69.2	69.8	69.8	69.4	69.5	69.5	68.3	67.5	67.4	67.2	...	71.5	66.1	69.5	67.4	68.3	59.0	66.9	60.8	66.6	65.7
2012-01-01 06:10:00	69.2	69.0	68.6	68.7	68.6	68.9	61.7	68.3	67.4	67.7	...	71.1	65.2	71.2	66.5	65.4	59.6	66.3	58.4	68.2	65.6
2012-01-01 06:15:00	69.9	69.6	69.7	69.2	69.0	69.1	65.3	67.6	67.1	66.8	...	69.9	67.1	69.3	66.9	68.2	60.6	66.0	55.5	67.1	69.7
2012-01-01 06:20:00	68.7	68.4	68.2	67.9	68.3	69.3	67.0	68.4	68.2	68.2	...	70.9	67.2	69.9	65.6	66.7	62.8	66.2	62.6	67.2	67.5

Learn Data Science Series Part 1

Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

inDecay Training data : processed dataframe + indelgen

Shopping Mall

Dataset for: Infectious disease responses to human climate change...

Description of the data and file structure

Sharing/Access information

Data from: dblp XML dataset as CSV for Python Data Analysis Library

Longitudinal corpus of privacy policies

HiPR-FISH Spatial Mapping of Cheese Rind Microbial Communities

oldIT2modIT

Dataset for: Cattle aggregations at shared resources create potential...

Description of the data and file structure

National Water Model RouteLinks CSV

Speed profiles of freeways in California (I5-S and I210-E)

GENEActiv accelerometer file related to the #120 OxWearables / stepcount...

Extracted data

Data and code for the manuscript - The hidden biodiversity knowledge split...

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Street Network Data- Newyork City

Using seed morphological traits to predict early performance using...

Expression vs genomics for predicting dependencies

Philaenus spumarius and other spittlebugs in Trentino. Italy

Data files for: The effect of building ability and object availability on...

Learn Data Science Series Part 1

This module contains learning material to master Pandas

Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview: