13 datasets found

Data Mining Project - Boston
kaggle.com
zip
Updated Nov 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
Explore at:
zip(59313797 bytes)Available download formats
Dataset updated
Nov 25, 2019
Authors
SophieLiu
Area covered
Boston
Description
Context

To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

Use of Data Files

You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

This loads the file into R

df<-read.csv('uber.csv')

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

df_black<-subset(uber_df, uber_df$name == 'Black')

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

getwd()

The output will be the file path to your working directory. You will find the file you just created in that folder.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Data from: Data and code from: Environmental influences on drying rate of...
catalog.data.gov
datasetcatalog.nlm.nih.gov
+2more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Environmental influences on drying rate of spray applied disinfestants from horticultural production services [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-environmental-influences-on-drying-rate-of-spray-applied-disinfestants-
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
This dataset includes all the data and R code needed to reproduce the analyses in a forthcoming manuscript:Copes, W. E., Q. D. Read, and B. J. Smith. Environmental influences on drying rate of spray applied disinfestants from horticultural production services. PhytoFrontiers, DOI pending.Study description: Instructions for disinfestants typically specify a dose and a contact time to kill plant pathogens on production surfaces. A problem occurs when disinfestants are applied to large production areas where the evaporation rate is affected by weather conditions. The common contact time recommendation of 10 min may not be achieved under hot, sunny conditions that promote fast drying. This study is an investigation into how the evaporation rates of six commercial disinfestants vary when applied to six types of substrate materials under cool to hot and cloudy to sunny weather conditions. Initially, disinfestants with low surface tension spread out to provide 100% coverage and disinfestants with high surface tension beaded up to provide about 60% coverage when applied to hard smooth surfaces. Disinfestants applied to porous materials were quickly absorbed into the body of the material, such as wood and concrete. Even though disinfestants evaporated faster under hot sunny conditions than under cool cloudy conditions, coverage was reduced considerably in the first 2.5 min under most weather conditions and reduced to less than or equal to 50% coverage by 5 min. Dataset contents: This dataset includes R code to import the data and fit Bayesian statistical models using the model fitting software CmdStan, interfaced with R using the packages brms and cmdstanr. The models (one for 2022 and one for 2023) compare how quickly different spray-applied disinfestants dry, depending on what chemical was sprayed, what surface material it was sprayed onto, and what the weather conditions were at the time. Next, the statistical models are used to generate predictions and compare mean drying rates between the disinfestants, surface materials, and weather conditions. Finally, tables and figures are created. These files are included:Drying2022.csv: drying rate data for the 2022 experimental runWeather2022.csv: weather data for the 2022 experimental runDrying2023.csv: drying rate data for the 2023 experimental runWeather2023.csv: weather data for the 2023 experimental rundisinfestant_drying_analysis.Rmd: RMarkdown notebook with all data processing, analysis, and table creation codedisinfestant_drying_analysis.html: rendered output of notebookMS_figures.R: additional R code to create figures formatted for journal requirementsfit2022_discretetime_weather_solar.rds: fitted brms model object for 2022. This will allow users to reproduce the model prediction results without having to refit the model, which was originally fit on a high-performance computing clusterfit2023_discretetime_weather_solar.rds: fitted brms model object for 2023data_dictionary.xlsx: descriptions of each column in the CSV data files
f
Data for the Farewell and Herberg example of a two-phase experiment using a...
datasetcatalog.nlm.nih.gov
researchdata.edu.au
+1more
Updated Jun 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brien, Chris (2021). Data for the Farewell and Herberg example of a two-phase experiment using a plaid design [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000884242
Explore at:
Dataset updated
Jun 12, 2021
Authors
Brien, Chris
Description
The experiment that Farewell and Herzberg (2003) describe is pain-rating experiment that is a subset of the experiment reported by Solomon et al. (1997). It is a two-phase experiment. The first phase is a self-assessment phase in which patients self-assess for pain while moving a painful shoulder joint. The second phase of this experiment is an evaluation phase in which occupational and physical therapy students (the raters) are evaluated for rating patients in a set of videos for pain. The measured response is the difference between a student rating and the patient's rating.The R data file plaid.dat.rda contains the data.frame plaid.dat that has a revised version of the data for the Farewell and Herzberg example downloaded from https://doi.org/10.17863/CAM.54494. The comma delimited text file plaid.dat.csv has the same information in this more commonly accepted format, but without the metadata associated with the data.frame<\CODE>.The data.frame contains the factors Raters, Viewings, Trainings, Expressiveness, Patients, Occasions, and Motions and a column for the response variable Y. The two factors Viewings and Occasions are additional to those in the downloaded file and the remaining factors have been converted from integers or characters to factors and renamed to the names given above. The column Y is unchanged from the column in the original file.To load the data in R use: load("plaid.dat.rda") or plaid.dat <- read.csv(file = "plaid.dat.csv").ReferencesFarewell, V. T.,& Herzberg, A. M. (2003). Plaid designs for the evaluation of training for medical practitioners. Journal of Applied Statistics, 30(9), 957-965. https://doi.org/10.1080/0266476032000076092Solomon, P. E., Prkachin, K. M. & Farewell, V. (1997). Enhancing sensitivity to facial expression of pain. Pain, 71(3), 279-284. https://doi.org/10.1016/S0304-3959(97)03377-0
4
PARAMOUNT: parallel modal analysis of large datasets
data.4tu.nl
zip
Updated Nov 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alireza Ghasemi; Jim Kok (2022). PARAMOUNT: parallel modal analysis of large datasets [Dataset]. http://doi.org/10.4121/20089760.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/20089760.v1
Dataset updated
Nov 28, 2022
Dataset provided by
4TU.ResearchData
Authors
Alireza Ghasemi; Jim Kok
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
PARAMOUNT: parallel modal analysis of large datasets

PARAMOUNT is a python package developed at University of Twente to perform modal analysis of large numerical and experimental datasets. Brief video introduction into the theory and methodology is presented _here.

Features

- Distributed processing of data on local machines or clusters using Dask Distributed
- Reading CSV files in glob format from specified folders
- Extracting relevant columns from CSV files and writing Parquet database for each specified variable
- Distributed computation of Proper Orthogonal Decomposition (POD)
- Writing U, S and V matrices into Parquet database for further analysis
- Visualizing POD modes and coefficients using pyplot

Using PARAMOUNT

Make sure to install the dependencies by running `pip install -r requirements.txt`

Refer to csv_example to see how to use PARAMOUNT to read CSV files, write the variables of interest into Parquet datasets and inspect the final datasets.

Refer to svd_example to see how to read Parquet datasets, compute the Singular Value Decomposition, and store the results in Parquet format.

To visualize the results you can simply read the U, S and V parquet files and your plotting tool of choice. Examples are provided in viz_example.

Author and Acknowledgements

This package is developed by Alireza Ghasemi (alireza.ghasemi@utwente.nl) at University of Twente under the MAGISTER (https://www.magister-itn.eu/) project. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 766264.

Annotated 12 lead ECG dataset

zenodo.org

zip

Updated Jun 7, 2021

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2021). Annotated 12 lead ECG dataset [Dataset]. http://doi.org/10.5281/zenodo.3625007

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3625007

Dataset updated

Jun 7, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# Annotated 12 lead ECG dataset

Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students.
It is used as test set on the paper:
"Automatic Diagnosis of the Short-Duration12-Lead ECG using a Deep Neural Network".

It contain annotations about 6 different ECGs abnormalities:
- 1st degree AV block (1dAVb);
- right bundle branch block (RBBB);
- left bundle branch block (LBBB);
- sinus bradycardia (SB);
- atrial fibrillation (AF); and,
- sinus tachycardia (ST).

## Folder content:

- `ecg_tracings.hdf5`: HDF5 file containing a single dataset named `tracings`. This dataset is a 
`(827, 4096, 12)` tensor. The first dimension correspond to the 827 different exams from different 
patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12
different leads of the ECG exam. 

The signals are sampled at 400 Hz. Some signals originally have a duration of 
10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples).
In order to make them all have the same size (4096 samples) we fill them with zeros
on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648
samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved
in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should
be multiplied by 1000 in order to obtain the signals in V.

In python, one can read this file using the following sequence:
```python
import h5py
with h5py.File(args.tracings, "r") as f:
  x = np.array(f['tracings'])
```

- The file `attributes.csv` contain basic patient attributes: sex (M or F) and age. It
contain 827 lines (plus the header). The i-th tracing in `ecg_tracings.hdf5` correspond to the i-th line.
- `annotations/`: folder containing annotations csv format. Each csv file contain 827 lines (plus the header).
The i-th line correspond to the i-th tracing in `ecg_tracings.hdf5` correspond to the in all csv files.
The csv files all have 6 columns `1dAVb, RBBB, LBBB, SB, AF, ST`
corresponding to weather the annotator have detect the abnormality in the ECG (`=1`) or not (`=0`).
 1. `cardiologist[1,2].csv` contain annotations from two different cardiologist.
 2. `gold_standard.csv` gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2
 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a 
 third senior specialist, aware of the annotations from the other two, decided the diagnosis. 
 3. `dnn.csv` prediction from the deep neural network described in 
 "Automatic Diagnosis of the Short-Duration 12-Lead ECG using a Deep Neural Network". The threshold is set in such way 
 it maximizes the F1 score.
 4. `cardiology_residents.csv` annotations from two 4th year cardiology residents (each annotated half of the dataset).
 5. `emergency_residents.csv` annotations from two 3rd year emergency residents (each annotated half of the dataset).
 6. `medical_students.csv` annotations from two 5th year medical students (each annotated half of the dataset).

FacialRecognition
kaggle.com
zip
Updated Dec 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition
Explore at:
zip(121674455 bytes)Available download formats
Dataset updated
Dec 1, 2016
Authors
TheNicelander
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description

#https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

###To look at samples of the data, uncomment this line:

head(d.train)

###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

#strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

install.packages('foreach')

library("foreach", lib.loc="~/R/win-library/3.3")

###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

save(d.train, im.train, d.test, im.test, file='data.Rd')

load('data.Rd')

#each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

#im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

#To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

#Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

#Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

#there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

#One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

#To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

#The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

install.packages('reshape2')

library(...
Kickastarter Campaigns
kaggle.com
zip
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Cantara (2024). Kickastarter Campaigns [Dataset]. https://www.kaggle.com/datasets/alessiocantara/kickastarter-project/discussion
Explore at:
zip(2233314 bytes)Available download formats
Dataset updated
Jan 25, 2024
Authors
Alessio Cantara
Description
Welcome to my Kickstarter case study! In this project I’m trying to understand what the success’s factors for a Kickstarter campaign are, analyzing an available public dataset from Web Robots. The process of analysis will follow the data analysis roadmap: ASK, PREPARE, PROCESS, ANALYZE, SHARE and ACT.

ASK

Different questions will guide my analysis: 1. Is the campaign duration influencing the success of the project? 2. Is it the chosen funding budget? 3. Which category of campaign is the most likely to be successful?

PREPARE

I’m using the Kickstarter Datasets publicly available on Web Robots. Data are scraped using a bot which collects the data in CSV format once a month and all the data are divided into CSV files. Each table contains: - backers_count : number of people that contributed to the campaign - blurb : a captivating text description of the project - category : the label categorizing the campaign (technology, art, etc) - country - created_at : day and time of campaign creation - deadline : day and time of campaign max end - goal : amount to be collected - launched_at : date and time of campaign launch - name : name of campaign - pledged : amount of money collected - state : success or failure of the campaign

Each month scraping produce a huge amount of CSVs, so for an initial analysis I decided to focus on three months: November and December 2023, and January 2024. I’ve downloaded zipped files which once unzipped contained respectively: 7 CSVs (November 2023), 8 CSVs (December 2023), 8 CSVs (January 2024). Each month was divided into a specific folder.

Having a first look at the spreadsheets, it’s clear that there is some need for cleaning and modification: for example, dates and times are shown in Unix code, there are multiple columns that are not helpful for the scope of my analysis, currencies need to be uniformed (some are US$, some GB£, etc). In general, I have all the data that I need to answer my initial questions, identify trends, and make predictions.

PROCESS

I decided to use R to clean and process the data. For each month I started setting a new working environment in its own folder. After loading the necessary libraries: R library(tidyverse) library(lubridate) library(ggplot2) library(dplyr) library(tidyr) I scripted a general R code that searches for CSVs files in the folder, open them as separate variable and into a single data frame:

csv_files <- list.files(pattern = "\\.csv$") data_frames <- list() for (file in csv_files) { variable_name <- sub("\\.csv$", "", file) assign(variable_name, read.csv(file)) data_frames[[variable_name]] <- get(variable_name) }

Next, I converted some columns in numeric values because I was running into types error when trying to merge all the CSVs into a single comprehensive file.

data_frames <- lapply(data_frames, function(df) { df$converted_pledged_amount <- as.numeric(df$converted_pledged_amount) return(df) }) data_frames <- lapply(data_frames, function(df) { df$usd_exchange_rate <- as.numeric(df$usd_exchange_rate) return(df) }) data_frames <- lapply(data_frames, function(df) { df$usd_pledged <- as.numeric(df$usd_pledged) return(df) })

In each folder I then ran a command to merge the CSVs in a single file (one for November 2023, one for December 2023 and one for January 2024):

all_nov_2023 = bind_rows(data_frames) all_dec_2023 = bind_rows(data_frames) all_jan_2024 = bind_rows(data_frames)`

After merging I converted the UNIX code datestamp into a readable datetime for the columns “created”, “launched”, “deadline” and deleted all the columns that had these data set to 0. I also filtered the values into the “slug” columns to show only the category of the campaign, without unnecessary information for the scope of my analysis. The final table was then saved.

filtered_dec_2023 <- all_dec_2023 %>% #this was modified according to the considered month select(blurb, backers_count, category, country, created_at, launched_at, deadline,currency, usd_exchange_rate, goal, pledged, state) %>% filter(created_at != 0 & deadline != 0 & launched_at != 0) %>% mutate(category_slug = sub('.*?"slug":"(.*?)".*', '\\1', category)) %>% mutate(created = as.POSIXct(created_at, origin = "1970-01-01")) %>% mutate(launched = as.POSIXct(launched_at, origin = "1970-01-01")) %>% mutate(setted_deadline = as.POSIXct(deadline, origin = "1970-01-01")) %>% select(-category, -deadline, -launched_at, -created_at) %>% relocate(created, launched, setted_deadline, .before = goal) write.csv(filtered_dec_2023, "filtered_dec_2023.csv", row.names = FALSE)

The three generated files were then merged into one comprehensive CSV called "kickstarter_cleaned" which was further modified, converting a...
Cyclistic_Divvy_data
kaggle.com
zip
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rami Ghaith (2023). Cyclistic_Divvy_data [Dataset]. https://www.kaggle.com/datasets/ramighaith/cyclistic-divvy-data
Explore at:
zip(21440758 bytes)Available download formats
Dataset updated
Jun 11, 2023
Authors
Rami Ghaith
Description
The following data shows riding information for members vs casual riders at the company Cyclistic(made up name). This is a dataset used as a case study for the google data analytics certificate.

The Changes Done to the Data in Excel: - Removed all duplicated (none were found) - Added a ride_length column by subtracting ended_at by started_at using the following formula "=C2-B2" and then turned that type into a Time, 37:30:55 - Added a day_of_week column using the following formula "=WEEKDAY(B2,1)" to display the day the ride took place on, 1= sunday through 7=saturday. - There was data that can be seen as ########, that data was left the same with no changes done to it, this data simply represents negative data and should just be looked at as 0.

Processing the Data in RStudio: - Installed required packages such as tidyverse for data import and wrangling, lubridate for date functions and ggplot for visualization. - Step 1: I read the csv files into R to collect the data - Step 2: Made sure the data all contained the same column names because I want to merge them into one - Step 3: Renamed all column names to make sure they align, then merged them into one combined data - Step 4: More data cleaning and analyzing - Step 5: Once my data was cleaned and clearly telling a story, I began to visualize it. The visualizations done can be seen below.
Data and scripts for "The importance of within-log sampling replication in...
zenodo.org
bin, csv
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Domenica Naranjo Orrico; Domenica Naranjo Orrico; Jenna Purhonen; Jenna Purhonen; Brendan Furneaux; Brendan Furneaux; Katri Ketola; Otso Ovaskainen; Otso Ovaskainen; Nerea Abrego; Nerea Abrego; Katri Ketola (2025). Data and scripts for "The importance of within-log sampling replication in bark- and wood-inhabiting fungal metabarcoding studies" [Dataset]. http://doi.org/10.5281/zenodo.15323471
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15323471
Dataset updated
May 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Domenica Naranjo Orrico; Domenica Naranjo Orrico; Jenna Purhonen; Jenna Purhonen; Brendan Furneaux; Brendan Furneaux; Katri Ketola; Otso Ovaskainen; Otso Ovaskainen; Nerea Abrego; Nerea Abrego; Katri Ketola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and scripts for reproducing the analyses of Naranjo-Orrico et al, "The importance of within-log sampling replication in bark- and wood-inhabiting fungal metabarcoding studies".

The input data consists of the following four files, "Alldata.Rdata", "data_SbVenn_meta&morpho.Rdata" , "Xmorpho.csv" and "Ymorpho_1.csv". The former two files are in R format and the latter two in CSV format. The R files need to be loaded using the function load, and the CSV files with the function read.csv2 in R.

"Alldata.Rdata" includes in total 15 input data matrices:

- Metadata for dataset A (meta22)

- Metadata for dataset B (meta23)

- Two sample x OTU tables for dataset A including the number of reads for each OTU (otu.table.plausible.2022 for the plausible OTU taxonomic identifications and otu.table.reliable.2022 for the reliable taxonomic identifications).

- Two sample x OTU tables for dataset B including the number of reads for each OTU (otu.table.plausible.2023 for the plausible OTU taxonomic identifications and otu.table.reliable.2023 for the reliable OTU taxonomic identifications).

- Two sample x OTU tables for dataset A including the relative read counts per OTU (otu.table.plausible.w.2022 and otu.table.reliable.w.2022).

- Two sample x OTU tables for dataset B including the relative read counts per OTU (otu.table.plausible.w.2023 and otu.table.reliable.w.2023).

- Read counts per sample during the different phases of the bioinformatics pipeline for dataset A (read.counts.plausible.2022) and for dataset B (read.counts.plausible.2023).

- Taxonomic information at all taxonomic levels (i.e., form species to phylum) of the identified OTUs (taxonomy.plausible)

- Guild assignments matrices for dataset A (Guilds_plausible_tax_2022) and for dataset B Guilds_plausible_tax_2023).

"data_SbVenn_meta&morpho.Rdata" contains four matrices:

- Occurrence of the lichenized OTUs identified through metabarcoding including identifications at any taxonomic level (i.e., genus or family levels when species level identifications were not achieved) (SbVenn_Lmeta).

- Occurrence of the lichenized OTUs identified through metabarcoding including identifications at the species-only level (SB_Venn_clean_meta).

- Occurrence of the morphologically identified lichenized fungi, including identifications at the genus level and morphospecies (SbVenn_Lmorpho)

- Occurrences of the morphologically identified lichenized fungi, including identifications at the species-only level (SbVenn_clean_morpho).

"Xmorpho.csv" and "Ymorpho_1.csv" contain respectively the metadata and the presence-absence data of the morphologically identified lichens.

“Alldata.Rdata” is used in all the scripts, "data_SbVenn_meta&morpho.Rdata" is only needed for the script "S8_Venn Diagrams.R", and the the files "Xmorpho.csv" and "Ymorpho_1.csv" are used in "S11_Meta vs Morpho species richnes between tree sp and tree part.R".

The statistical analyses consist of joint species distribution modelling with the package Hmsc, generalized linear mixed models (GLMM) with the package glmer, and non-metric multidimensional scaling analysis (NMDS) with the package vegan. To perform the HMSC analyses, the first FOUR scripts need to be run consecutively from S1 (A and B) to S3. S1A defines the first model using data A, and S1B defines the second model using dataset B. S2 fits the models fitted in the study (which include presence-absence with a different set of explanatory variables). S3 shows the parameter estimates from the fitted models, in particular the beta parameters and the variance partitioning across environmental covariates. For fitting and showing the outputs of the GLMM models only S4 is needed. In S5, runs the NMDS analyses. The rest of the scripts, S6-S11 are used to produce the different plots shown in the study of Naranjo-Orrico et al., including pieplots, boxplots, barplots, and Vennplots.

Dollar-Rial-Toman Live Price Dataset

kaggle.com

zip

Updated Nov 7, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Koorosh Komeilizadeh (2025). Dollar-Rial-Toman Live Price Dataset [Dataset]. https://www.kaggle.com/datasets/kooroshkz/dollar-rial-toman-live-price-dataset

Explore at:

zip(66708 bytes)Available download formats

Dataset updated

Nov 7, 2025

Authors

Koorosh Komeilizadeh

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dollar-Rial-Toman Live Price Dataset

A comprehensive, daily-updated dataset of US Dollar to Iranian Rial exchange rates (USD/IRR) with historical data from November 2011 to present. This dataset is ideal for financial analysis, economic research, forecasting, and machine learning projects.

Dataset Overview

Time Period: November 26, 2011 - Present (continuously updated)
Total Records: 3,648+ daily price points
Data Source: TGJU.org (Tehran Gold & Jewelry Union)
Update Frequency: Daily (automated via GitHub Actions)
Format: CSV with proper date formatting and integer price structure

Data Structure

The CSV file contains the following columns:

Column	Description	Format	Example
Open Price	Opening price of the day	Integer	1012100
Low Price	Lowest price of the day	Integer	1011700
High Price	Highest price of the day	Integer	1034100
Close Price	Closing price of the day	Integer	1029800
Change Amount	Price change amount	String	15400
Change Percent	Price change percentage	String	1.52%
Gregorian Date	Gregorian date	YYYY/MM/DD	2025/09/06
Persian Date	Persian/Shamsi date	YYYY/MM/DD	1404/06/15

Download the Data

Dataset on Kaggle: Kaggle: kooroshkz/Dollar-Rial-Toman-Live-Price-Dataset
Dataset on Github: GitHub: kooroshkz/Dollar-Rial-Toman-Live-Price-Dataset

View Scraper and workflow source on GitHub

This live dataset, scraper source code and workflow is available on GitHub where you can explore, download, and use it directly.

Documentation & Charts

"https://kooroshkz.github.io/Dollar-Rial-Toman-Live-Price-Dataset/" target="_blank"> https://raw.githubusercontent.com/kooroshkz/Dollar-Rial-Toman-Live-Price-Dataset/main/assets/img/IntractiveChart.png">

Interactive charts and dataset overview are available at:
kooroshkz.github.io/Dollar-Rial-Toman-Live-Price-Dataset

Loading in Python

import pandas as pd

# Load dataset
df = pd.read_csv('data/Dollar_Rial_Price_Dataset.csv')

# Convert date column to datetime
df['Gregorian Date'] = pd.to_datetime(df['Gregorian Date'], format='%Y/%m/%d')

# Price columns are already integers
price_columns = ['Open Price', 'Low Price', 'High Price', 'Close Price']
print(df[price_columns].dtypes) # All should be int64

Direct Load in Python

# pip install kagglehub[hf-datasets]
import kagglehub

df = kagglehub.load_dataset(
  "kooroshkz/dollar-rial-toman-live-price-dataset",
  adapter="huggingface",
  file_path="Dollar_Rial_Price_Dataset.csv",
  pandas_kwargs={"parse_dates": ["Gregorian Date"]}
)

print(df.head())

Loading in R

# Load dataset
data <- read.csv("data/Dollar_Rial_Price_Dataset.csv", stringsAsFactors = FALSE)

# Convert date column
data$Gregorian.Date <- as.Date(data$Gregorian.Date, format = "%Y/%m/%d")

# View structure
str(data)

Data Quality & Updates

Validation: All price data undergoes validation checks for accuracy
Automated Updates: Dataset is automatically updated daily at 8:00 AM UTC
Data Integrity: Built-in duplicate prevention and format validation
Historical Consistency: Maintains consistent formatting across all time periods
Integer Prices: All price values stored as integers for precise calculations

Technical Implementation

This dataset is maintained using an automated web scraping system that:

Monitors TGJU.org for new exchange rate data
Validates and processes new records
Maintains data consistency and prevents duplicates
Automatically commits updates to the repository

Contributing

If you find data inconsistencies or have suggestions for improvements, please open an issue in the GitHub repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this dataset in your research or projects, please cite:

Dollar-Rial-Toman Live Price Dataset
Author: Koorosh Komeili Zadeh
Source: https://github.com/kooroshkz/Dollar-Rial-Toman-Live-Price-Dataset
Data Source: TGJU.org (Tehran Gold & Jewelry Union)
Date Range: November 2011 - Present

Keywords

USD to Rial dataset, Dollar to Toman dataset, Iran exchange rate CSV, USD/IRR daily price, foreign exchange Iran dataset, TGJU data, time series currency dataset

Disclaimer

This ...

m
ESG rating of general stock indices
data.mendeley.com
narcis.nl
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Szilárd Erhart (2021). ESG rating of general stock indices [Dataset]. http://doi.org/10.17632/58mwkj5pf8.1
Explore at:
Unique identifier
https://doi.org/10.17632/58mwkj5pf8.1
Dataset updated
Oct 22, 2021
Authors
Szilárd Erhart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

THE FILES HAVE BEEN CREATED BY SZILÁRD ERHART FOR A RESEARCH: ERHART (2021): ESG RATINGS OF GENERAL

STOCK EXCHANGE INDICES, INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS

USERS OF THE FILES AGREE TO QUOTE THE ABOVE PAPER

THE PYTHON SCRIPT (PYTHONESG_ERHART.TXT) HELPS USERS TO GET TICKERS BY STOCK EXCHANGES AND EXTRACT ESG SCORES FOR THE UNDERLYING STOCKS FROM YAHOO FINANCE.

THE R SCRIPT (ESG_UA.TXT) HELPS TO REPLICATE THE MONTE CARLO EXPERIMENT DETAILED IN THE STUDY.

THE EXPORT_ALL CSV CONTAINS THE DOWNLOADED ESG DATA (SCORES, CONTROVERSIES, ETC) ORGANIZED BY STOCKS AND EXCHANGES.

DISCLAIMER

The author takes no responsibility for the timeliness, accuracy, completeness or quality of the information provided.

The author is in no event liable for damages of any kind incurred or suffered as a result of the use or non-use of the

information presented or the use of defective or incomplete information.

The contents are subject to confirmation and not binding.

The author expressly reserves the right to alter, amend, whole and in part,

without prior notice or to discontinue publication for a period of time or even completely.

##############################READ ME

BEFORE USING THE MONTE CARLO SIMULATIONS SCRIPT:

(1) COPY THE goascores.csv and goalscores_alt.csv FILES ONTO YOUR ON COMPUTER DRIVE. THE TWO FILES ARE IDENTICAL.

(2) SET THE EXACT FILE LOCATION INFORMATION IN THE 'Read in data' SECTION OF THE MONTE CARLO SCRIPT AND FOR THE OUTPUT FILES AT THE END OF THE SCRIPT

(3) LOAD MISC TOOLS AND MATRIXSTATS IN YOUR R APPLICATION

(4) RUN THE CODE.

##############################READ ME
Data from: A dataset to model Levantine landcover and land-use change...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10396148
Dataset updated
Dec 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Kempf; Michael Kempf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 16, 2023
Area covered
Levant
Description
Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R

This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

2_MERGE_MODIS_tiles.R

In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

3_CROP_MODIS_merged_tiles.R

Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.

4_TREND_analysis_NDVI.R

Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

5_BUILT_UP_change_raster.R

Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

6_POPULATION_numbers_plot.R

For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

7_YIELD_plot.R

In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

8_GLDAS_read_extract_trend

The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
forest cover data
kaggle.com
zip
Updated Apr 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShalvaRai16MCB0025 (2017). forest cover data [Dataset]. https://www.kaggle.com/shalv16mcb0025/forest-cover-data
Explore at:
zip(3155 bytes)Available download formats
Dataset updated
Apr 22, 2017
Authors
ShalvaRai16MCB0025
Description
`require(rgdal) require(sp) x<-"rgdal" if (!require(x,character.only=TRUE))
{ install.packages(pkgs=x,dependencies = TRUE) require(x,character.only=TRUE) }

location

location="C:/Users/acer/Documents/R/data/India Map" india1<- readOGR(dsn = location,"IND_adm1")

Plot india

plot(india1) slotNames(india1) names(india1) head(india1@data)

name are available in dataset

head(india1$NAME_1,10)

Sample plot ane state

plot(india1[india1$NAME_1=="Delhi",],col="red") title("Delhi")

Read file that contain forest information

forestdata<-read.csv(file="C:/Users/acer/Documents/R/data/Recorded_Forest_Area.csv", stringsAsFactors = FALSE) head(forestdata) names(forestdata)

name are too long lets change it

colnames(forestdata)<-c("state","statearea","forestarea2005","reserved","protected","unclassed","totalforestarea","forestareapercent") names(forestdata) head(forestdata)

now change factpr to character

india1$NAME_1 = as.character(india1$NAME_1) forestdata$state=as.character(forestdata$state)

now check all data in map and csv file are matched or not

india1$NAME_1 %in% forestdata$state

return row which is having missmatch

india1$NAME_1[which(!india1$NAME_1 %in% forestdata$state)]

So the issue is with and in place of & and also a name of the state Uttaranchal that was changed later to Uttrakhand

#

Let us make the relevant changes

india1$NAME_1[grepl("Andaman and Nicobar",india1$NAME_1)]="Andaman & Nicobar" india1$NAME_1[grepl("Dadra and Nagar Haveli",india1$NAME_1)]="Dadra & Nagar Haveli" india1$NAME_1[grepl("Jammu and Kashmir",india1$NAME_1)]="Jammu & Kashmir" india1$NAME_1[grepl("Daman and Diu",india1$NAME_1)]="Daman & Diu" india1$NAME_1[grepl("Uttaranchal",india1$NAME_1)]="Uttarakhand"

now check again the matching or nor

india1$NAME_1%in%forestdata$state
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston

Data Mining Project - Boston

Explore at:

zip(59313797 bytes)Available download formats

Dataset updated

Nov 25, 2019

Authors

SophieLiu

Area covered

Boston

Description

Context

To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

Use of Data Files

You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

This loads the file into R

df<-read.csv('uber.csv')

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

df_black<-subset(uber_df, uber_df$name == 'Black')

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

getwd()

The output will be the file path to your working directory. You will find the file you just created in that folder.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Clear search

Close search

Google apps

Main menu

Data Mining Project - Boston

Context

Use of Data Files

This loads the file into R

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

The output will be the file path to your working directory. You will find the file you just created in that folder.

Inspiration

Data from: Data and code from: Environmental influences on drying rate of...

Data for the Farewell and Herberg example of a two-phase experiment using a...

PARAMOUNT: parallel modal analysis of large datasets

Annotated 12 lead ECG dataset

FacialRecognition

head(d.train)

install.packages('foreach')

save(d.train, im.train, d.test, im.test, file='data.Rd')

load('data.Rd')

install.packages('reshape2')

Kickastarter Campaigns

Cyclistic_Divvy_data

Data and scripts for "The importance of within-log sampling replication in...

Dollar-Rial-Toman Live Price Dataset

Dollar-Rial-Toman Live Price Dataset

Dataset Overview

Data Structure

Download the Data

View Scraper and workflow source on GitHub

Documentation & Charts

Loading in Python

Direct Load in Python

Loading in R

Data Quality & Updates

Technical Implementation

Contributing

License

Citation

Keywords

Disclaimer

ESG rating of general stock indices

THE FILES HAVE BEEN CREATED BY SZILÁRD ERHART FOR A RESEARCH: ERHART (2021): ESG RATINGS OF GENERAL

STOCK EXCHANGE INDICES, INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS

USERS OF THE FILES AGREE TO QUOTE THE ABOVE PAPER

THE PYTHON SCRIPT (PYTHONESG_ERHART.TXT) HELPS USERS TO GET TICKERS BY STOCK EXCHANGES AND EXTRACT ESG SCORES FOR THE UNDERLYING STOCKS FROM YAHOO FINANCE.

THE R SCRIPT (ESG_UA.TXT) HELPS TO REPLICATE THE MONTE CARLO EXPERIMENT DETAILED IN THE STUDY.

THE EXPORT_ALL CSV CONTAINS THE DOWNLOADED ESG DATA (SCORES, CONTROVERSIES, ETC) ORGANIZED BY STOCKS AND EXCHANGES.

DISCLAIMER

The author takes no responsibility for the timeliness, accuracy, completeness or quality of the information provided.

The author is in no event liable for damages of any kind incurred or suffered as a result of the use or non-use of the

information presented or the use of defective or incomplete information.

The contents are subject to confirmation and not binding.

The author expressly reserves the right to alter, amend, whole and in part,

without prior notice or to discontinue publication for a period of time or even completely.

##############################READ ME

BEFORE USING THE MONTE CARLO SIMULATIONS SCRIPT:

(1) COPY THE goascores.csv and goalscores_alt.csv FILES ONTO YOUR ON COMPUTER DRIVE. THE TWO FILES ARE IDENTICAL.

(2) SET THE EXACT FILE LOCATION INFORMATION IN THE 'Read in data' SECTION OF THE MONTE CARLO SCRIPT AND FOR THE OUTPUT FILES AT THE END OF THE SCRIPT

(3) LOAD MISC TOOLS AND MATRIXSTATS IN YOUR R APPLICATION

(4) RUN THE CODE.

##############################READ ME

Data from: A dataset to model Levantine landcover and land-use change...

forest cover data

location

Plot india

name are available in dataset

Sample plot ane state

Read file that contain forest information

name are too long lets change it

now change factpr to character

now check all data in map and csv file are matched or not

return row which is having missmatch

So the issue is with and in place of & and also a name of the state Uttaranchal that was changed later to Uttrakhand

Let us make the relevant changes

now check again the matching or nor

Data Mining Project - Boston

Context

Use of Data Files

This loads the file into R

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.