This data package includes the underlying data to replicate the charts, tables, and calculations presented in The US Revenue Implications of President Trump’s 2025 Tariffs, PIIE Briefing 25-2.
If you use the data, please cite as:
McKibbin, Warwick, and Geoffrey Shuetrim. 2025. The US Revenue Implications of President Trump’s 2025 Tariffs. PIIE Briefing 25-2. Washington: Peterson Institute for International Economics.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
You are an analyst at "Megaline," a federal mobile operator. The company offers two tariff plans to customers: "Smart" and "Ultra." To adjust the advertising budget, the commercial department wants to understand which tariff generates more revenue.
You need to conduct a preliminary analysis of the tariffs on a small sample of customers. You have data on 500 users of "Megaline": who they are, where they are from, which tariff they use, how many calls and messages they sent in 2018. You need to analyze customer behavior and conclude which tariff is better.
"Smart" Tariff: - Monthly fee: 550 rubles - Included: 500 minutes of calls, 50 messages, and 15 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 3 rubles (Megaline always rounds up minutes and megabytes. If the user talked for just 1 second, it counts as a whole minute); 2. Message: 3 rubles; 3. 1 GB of internet traffic: 200 rubles.
"Ultra" Tariff: - Monthly fee: 1950 rubles - Included: 3000 minutes of calls, 1000 messages, and 30 GB of internet traffic - Cost of services beyond the tariff package: 1. Call minute: 1 ruble; 2. Message: 1 ruble; 3. 1 GB of internet traffic: 150 rubles.
Note: Megaline always rounds up seconds to minutes and megabytes to gigabytes. Each call is rounded up individually: even if it lasted just 1 second, it is counted as 1 minute. For web traffic, separate sessions are not counted. Instead, the total amount for the month is rounded up. If a subscriber uses 1025 megabytes in a month, they are charged for 2 gigabytes.
Step 1: Open the file with data and study the general information
File paths:
- /datasets/calls.csv
- /datasets/internet.csv
- /datasets/messages.csv
- /datasets/tariffs.csv
- /datasets/users.csv
Step 2: Prepare the data - Convert data to the required types; - Find and fix errors in the data, if any. Explain what errors you found and how you fixed them. You will find calls with zero duration in the data. This is not an error: missed calls are indicated by zeros, so they do not need to be deleted.
For each user, calculate: - Number of calls made and minutes spent per month; - Number of messages sent per month; - Amount of internet traffic used per month; - Monthly revenue from each user (subtract the free limit from the total number of calls, messages, and internet traffic; multiply the remainder by the value from the tariff plan; add the corresponding tariff plan's subscription fee).
Step 3: Analyze the data Describe the behavior of the operator's customers based on the sample. How many minutes of calls, how many messages, and how much internet traffic do users of each tariff need per month? Calculate the average, variance, and standard deviation. Create histograms. Describe the distributions.
Step 4: Test hypotheses - The average revenue of users of the "Ultra" and "Smart" tariffs is different; - The average revenue of users from Moscow differs from the revenue of users from other regions. Moscow is written as 'Москва'. You can put it in your value, when check the hypothesis
Set the threshold alpha value yourself.
Explain: - How you formulated the null and alternative hypotheses; - Which criterion you used to test the hypotheses and why.
Step 5: Write a general conclusion
Formatting: Perform the task in Jupyter Notebook. Fill the program code in the cells of type code
, and the textual explanations in the cells of type markdown
. Apply formatting and headers.
Table users
(user information):
- user_id
: unique user identifier
- first_name
: user's first name
- last_name
: user's last name
- age
: user's age (years)
- reg_date
: date of tariff connection (day, month, year)
- churn_date
: date of tariff discontinuation (if the value is missing, the tariff was still active at the time of data extraction)
- city
: user's city of residence
- tariff
: name of the tariff plan
Table calls
(call information):
- id
: unique call number
- call_date
: call date
- duration
: call duration in minutes
- user_id
: identifier of the user who made the call
Table messages
(message information):
- id
: unique message number
- message_date
: message date
- user_id
: identifier of the user who sent the message
Table internet
(internet session information):
- id
: unique session number
- mb_used
: amount of internet traffic used during the session (in megabytes)
- session_date
: internet session date
- user_id
: user identifier
Table tariffs
(tariff information):
- tariff_name
: tariff name
- rub_monthly_fee
: monthly subscription fee in rubles
- minutes_included
: number of call minutes included per month
- `messages_included...
Daily overview of federal revenue collections such as income tax deposits, customs duties, fees for government service, fines, and loan repayments.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This data package includes the underlying data to replicate the charts, tables, and calculations presented in The US Revenue Implications of President Trump’s 2025 Tariffs, PIIE Briefing 25-2.
If you use the data, please cite as:
McKibbin, Warwick, and Geoffrey Shuetrim. 2025. The US Revenue Implications of President Trump’s 2025 Tariffs. PIIE Briefing 25-2. Washington: Peterson Institute for International Economics.