4 datasets found
  1. h

    DirtyWritingPrompts

    • huggingface.co
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nothing is Real (2024). DirtyWritingPrompts [Dataset]. https://huggingface.co/datasets/nothingiisreal/DirtyWritingPrompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 27, 2024
    Dataset authored and provided by
    Nothing is Real
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    11K rows of r/DirtyWritingPrompts including score (which is upvotes - downvotes) You probably want to remove the last 1K row because it is sorted from most upvoted to least and few hundred of the last samples have negative upvotes lmao. We also removed any story under 400 characters.

  2. KC_House Dataset -Linear Regression of Home Prices

    • kaggle.com
    zip
    Updated May 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). KC_House Dataset -Linear Regression of Home Prices [Dataset]. https://www.kaggle.com/datasets/vikramamin/kc-house-dataset-home-prices
    Explore at:
    zip(776807 bytes)Available download formats
    Dataset updated
    May 15, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    1. Dataset: House pricing dataset containing 21 columns and 21613 rows.
    2. Programming Language : R
    3. Objective : To predict house prices by creating a model
    4. Steps : A) Import the dataset B) Install and run libraries C) Data Cleaning - Remove Null Values , Change Data Types , Dropping of Columns which are not important D) Data Analysis - (i)Linear Regression Model was used to establish the relationship between the dependent variable (price) and other independent variable (ii) Outliers were identified and removed (iii) Regression model was run once again after removing the outliers (iv) Multiple R- squared was calculated which indicated the independent variables can explain 73% change/ variation in the dependent variable (v) P value was less than that of alpha 0.05 which shows it is statistically significant. (vi) Interpreting the meaning of the results of the coefficients (vii) Checked the assumption of multicollinearity (viii) VIF(Variance inflation factor) was calculated for all the independent variables and their absolute value was found to be less than 5. Hence, there is not threat of multicollinearity and that we can proceed with the independent variables specified.
  3. FacialRecognition

    • kaggle.com
    zip
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition
    Explore at:
    zip(121674455 bytes)Available download formats
    Dataset updated
    Dec 1, 2016
    Authors
    TheNicelander
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    #https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

    ###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

    ###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

    ###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

    ###To look at samples of the data, uncomment this line:

    head(d.train)

    ###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

    im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

    im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

    ################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

    #strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

    ###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

    install.packages('foreach')

    library("foreach", lib.loc="~/R/win-library/3.3")

    ###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

    ###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

    save(d.train, im.train, d.test, im.test, file='data.Rd')

    load('data.Rd')

    #each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

    #im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

    #To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

    #Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

    #Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

    #there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

    #One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

    #To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

    #The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

    install.packages('reshape2')

    library(...

  4. d

    Forwarding Addresses & New Occupiers | Sagacity GAS-R | Update Goneaway...

    • datarade.ai
    .csv, .xls, .txt
    Updated Feb 28, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagacity (2017). Forwarding Addresses & New Occupiers | Sagacity GAS-R | Update Goneaway Records | 16m Records | UK Wide Coverage | 400k Records added every Month [Dataset]. https://datarade.ai/data-products/forwarding-addresses-new-occupiers-sagacity-gas-r-updat-sagacity
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Feb 28, 2017
    Dataset authored and provided by
    Sagacity
    Area covered
    United Kingdom
    Description

    What is it? GAS-R contains over 16m relocatable individuals with over 8m new occupiers. It is updated with an average of 400k records a month, sourced from:

    • Local authorities
    • Credit reference agencies
    • Mover and transactional data
    • A Leading UK Utility Company
    • Edited electoral roll
    • Our bespoke databases

    Use cases - Identify prospects and target them to “consider your brand when they move” - Support cross-selling to existing customers who are moving, but equally to prospects who are moving - Apply the data to target people or households on the 12, 24 and 36 month anniversary of their move - Reactivate lapsed / previous customers - Increase spend and consumer lifetime value through repeat purchases

    Additional Insights Our forwarding address file works best when used alongside our other suppressions, GAS and TBR. While useful to remove one set of suppressions from your data, you still run the risk of having unsuitable individuals in your marketing lists – by utilising our full suite of suppressions this removes any worry.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nothing is Real (2024). DirtyWritingPrompts [Dataset]. https://huggingface.co/datasets/nothingiisreal/DirtyWritingPrompts

DirtyWritingPrompts

nothingiisreal/DirtyWritingPrompts

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 27, 2024
Dataset authored and provided by
Nothing is Real
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

11K rows of r/DirtyWritingPrompts including score (which is upvotes - downvotes) You probably want to remove the last 1K row because it is sorted from most upvoted to least and few hundred of the last samples have negative upvotes lmao. We also removed any story under 400 characters.

Search
Clear search
Close search
Google apps
Main menu