This research study was conducted to analyze the (potential) relationship between hardware and data set sizes. 100 data scientists from France between Jan-2016 and Aug-2016 were interviewed in order to have exploitable data. Therefore, this sample might not be representative of the true population.
What can you do with the data?
I did not find any past research on a similar scale. You are free to play with this data set. For re-usage of this data set out of Kaggle, please contact the author directly on Kaggle (use "Contact User"). Please mention:
Arbitrarily, we chose characteristics to describe Data Scientists and data set sizes.
Data set size:
For the data, it uses the following fields (DS = Data Scientist, W = Workstation):
You should expect potential noise in the data set. It might not be "free" of internal contradictions, as with all researches.
A bike-sharing system is a service in which bikes are made available for shared use to individuals on a short term basis for a price or free. Many bike share systems allow people to borrow a bike from a "dock" which is usually computer-controlled wherein the user enters the payment information, and the system unlocks it. This bike can then be returned to another dock belonging to the same system.
A US bike-sharing provider BoomBikes has recently suffered considerable dip in their revenue due to the Corona pandemic. The company is finding it very difficult to sustain in the current market scenario. So, it has decided to come up with a mindful business plan to be able to accelerate its revenue.
In such an attempt, BoomBikes aspires to understand the demand for shared bikes among the people. They have planned this to prepare themselves to cater to the people's needs once the situation gets better all around and stand out from other service providers and make huge profits.
They have contracted a consulting company to understand the factors on which the demand for these shared bikes depends. Specifically, they want to understand the factors affecting the demand for these shared bikes in the American market. The company wants to know:
Based on various meteorological surveys and people's styles, the service provider firm has gathered a large dataset on daily bike demands across the American market based on some factors.
You are required to model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demands vary with different features. They can accordingly manipulate the business strategy to meet the demand levels and meet the customer's expectations. Further, the model will be a good way for management to understand the demand dynamics of a new market.
In the dataset provided, you will notice that there are three columns named 'casual', 'registered', and 'cnt'. The variable 'casual' indicates the number casual users who have made a rental. The variable 'registered' on the other hand shows the total number of registered users who have made a booking on a given day. Finally, the 'cnt' variable indicates the total number of bike rentals, including both casual and registered. The model should be built taking this 'cnt' as the target variable.
When you're done with model building and residual analysis and have made predictions on the test set, just make sure you use the following two lines of code to calculate the R-squared score on the test set.
python
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)
- where y_test is the test data set for the target variable, and y_pred is the variable containing the predicted values of the target variable on the test set.
- Please perform this step as the R-squared score on the test set holds as a benchmark for your model.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This research study was conducted to analyze the (potential) relationship between hardware and data set sizes. 100 data scientists from France between Jan-2016 and Aug-2016 were interviewed in order to have exploitable data. Therefore, this sample might not be representative of the true population.
What can you do with the data?
I did not find any past research on a similar scale. You are free to play with this data set. For re-usage of this data set out of Kaggle, please contact the author directly on Kaggle (use "Contact User"). Please mention:
Arbitrarily, we chose characteristics to describe Data Scientists and data set sizes.
Data set size:
For the data, it uses the following fields (DS = Data Scientist, W = Workstation):
You should expect potential noise in the data set. It might not be "free" of internal contradictions, as with all researches.