Data from Fortune 500's 2023 ranking.
Includes data on top 1000 companies w/ additional info (Stock symbol/*ticker*, CEO name).
Update (New dataset): 2024 Fortune 1000 Companies
From Investopedia:
The Fortune 1000 is an annual list of the 1000 largest American companies maintained by the popular magazine Fortune Fortune ranks the eligible companies by revenue generated from core operations, discounted operations, and consolidated subsidiaries Since revenue is the basis for inclusion, every company is authorized to operate in the United States and files a 10-K or comparable financial statement with a government agency -- .
Fortune magazine publishes this list every year and some lists can be found from different sources. From looking at this year's available datasets, some features were missing or could not be found. This was built from scraping the standard features as well as what's included on Company Info (such as CEO, Ticker and website) from the Fortune magazine website. Details on how the data was generated can be found on this notebook where a few of the features were also visualized.
The source code from the 2023 fortune 500 Ranking includes 1000 companies. A reference page (slug) to additional info is included for each companies which were also scrapped to complete the dataset.
Available formats: csv, parquet
Features are follows:
[Note: References to datatypes are relevant when using the parquet file; Labels refer to the original website names]
This statistic shows the ranking of the global top 10 biotech and pharmaceutical companies worldwide, based on revenue. The values are based on a 2025 database. U.S. pharmaceutical company Pfizer was ranked first, with a total revenue of around ** billion U.S. dollars. Biotech and pharmaceutical companiesPharmaceutical companies are best known for manufacturing pharmaceutical drugs. These drugs have the aim to diagnose, to cure, to treat, or to prevent diseases. The pharmaceutical sector represents a huge industry, with the global pharmaceutical market being worth around *** trillion U.S. dollars. The best known top global pharmaceutical players are Pfizer, Merck, and Johnson & Johnson from the U.S., Novartis and Roche from Switzerland, Sanofi from France, etc. Most of these companies are involved not only in pure pharmaceutical business, but also manufacture medical technology and consumer health products, vaccines, etc. There are both pure play biotechnology companies and pharmaceutical companies which among other products also produce biotech products within their biotechnological divisions. Most of the leading global pharmaceutical companies have biopharmaceutical divisions. Although not a pure play biotech firm, Roche from Switzerland is among the companies with the largest revenues from biotechnology products worldwide. In contrast, California-based company Amgen was one of the world’s first large pure play biotech companies. Biotech companies use biotechnology to generate their products, most often medical drugs or agricultural genetic engineering. The latter segment is dominated by companies like Bayer CropScience and Syngenta. The United Nations Convention on Biological Diversity defines biotechnology as follows: "Any technological application that uses biological systems, living organisms, or derivatives thereof, to make or modify products or processes for specific use." In fact, biotechnology is thousands of years old, used in agriculture, food manufacturing and medicine.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Top 10 sectors of unit weight in GIRCN.
https://www.industryselect.com/licensehttps://www.industryselect.com/license
The U.S. manufacturing sector plays a central role in the economy, accounting for 20% of U.S. capital investment, 60% of the nation's exports and 70% of business R&D. Overall, the sector's market size, measured in terms of revenue is worth roughly $6 trillion, making it a major industry to do business with. So which U.S. states are the biggest for manufacturing? This article will explore the nation's top manufacturing states, measured by number of employees, based on MNI's database of 400,000 U.S. manufacturing companies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Top 10 sectors of disparity in the weight in GIRCN.
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Business-critical Data Types We offer access to robust datasets sourced from over 13M job ads daily. Track companies’ growth, market focus, technological shifts, planned geographic expansion, and more: - Identify new business opportunities - Identify and forecast industry & technological trends - Help identify the jobs, teams, and business units that have the highest impact on corporate goals - Identify most in-demand skills and qualifications for key positions.
Fresh Datasets We regularly update our datasets, assuring you access to the latest data and allowing for timely analysis of rapidly evolving markets & dynamic businesses.
Historical Datasets We maintain at your disposal historical datasets, allowing for comprehensive, reliable, and statistically sound historical analysis, trend identification, and forecasting.
Easy Access and Retrieval Our job listing datasets are available in industry-standard, convenient JSON and CSV formats. These structured formats make our datasets compatible with machine learning, artificial intelligence training, and similar applications. The historical data retrieval process is quick and reliable thanks to our robust, easy-to-implement API integration.
Datasets for investors Investment firms and hedge funds use our datasets to better inform their investment decisions by gaining up-to-date, reliable insights into workforce growth, geographic expansion, market focus, technology shifts, and other factors of start-ups and established companies.
Datasets for businesses Our datasets are used by retailers, manufacturers, real estate agents, and many other types of B2B & B2C businesses to stay ahead of the curve. They can gain insights into the competitive landscape, technology, and product adoption trends as well as power their lead generation processes with data-driven decision-making.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data in this dataset were collected in the result of the survey of Latvian society (2021) aimed at identifying high-value data set for Latvia, i.e. data sets that, in the view of Latvian society, could create the value for the Latvian economy and society.
The survey is created for both individuals and businesses.
It being made public both to act as supplementary data for "Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia" paper (author: Anastasija Nikiforova, University of Latvia) and in order for other researchers to use these data in their own work.
The survey was distributed among Latvian citizens and organisations. The structure of the survey is available in the supplementary file available (see Survey_HighValueDataSets.odt)
***Description of the data in this data set: structure of the survey and pre-defined answers (if any)***
1. Have you ever used open (government) data? - {(1) yes, once; (2) yes, there has been a little experience; (3) yes, continuously, (4) no, it wasn’t needed for me; (5) no, have tried but has failed}
2. How would you assess the value of open govenment data that are currently available for your personal use or your business? - 5-point Likert scale, where 1 – any to 5 – very high
3. If you ever used the open (government) data, what was the purpose of using them? - {(1) Have not had to use; (2) to identify the situation for an object or ab event (e.g. Covid-19 current state); (3) data-driven decision-making; (4) for the enrichment of my data, i.e. by supplementing them; (5) for better understanding of decisions of the government; (6) awareness of governments’ actions (increasing transparency); (7) forecasting (e.g. trendings etc.); (8) for developing data-driven solutions that use only the open data; (9) for developing data-driven solutions, using open data as a supplement to existing data; (10) for training and education purposes; (11) for entertainment; (12) other (open-ended question)
4. What category(ies) of “high value datasets” is, in you opinion, able to create added value for society or the economy? {(1)Geospatial data; (2) Earth observation and environment; (3) Meteorological; (4) Statistics; (5) Companies and company ownership; (6) Mobility}
5. To what extent do you think the current data catalogue of Latvia’s Open data portal corresponds to the needs of data users/ consumers? - 10-point Likert scale, where 1 – no data are useful, but 10 – fully correspond, i.e. all potentially valuable datasets are available
6. Which of the current data categories in Latvia’s open data portals, in you opinion, most corresponds to the “high value dataset”? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
7. Which of them form your TOP-3? - {(1)Foreign affairs; (2) business econonmy; (3) energy; (4) citizens and society; (5) education and sport; (6) culture; (7) regions and municipalities; (8) justice, internal affairs and security; (9) transports; (10) public administration; (11) health; (12) environment; (13) agriculture, food and forestry; (14) science and technologies}
8. How would you assess the value of the following data categories?
8.1. sensor data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
8.2. real-time data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
8.3. geospatial data - 5-point Likert scale, where 1 – not needed to 5 – highly valuable
9. What would be these datasets? I.e. what (sub)topic could these data be associated with? - open-ended question
10. Which of the data sets currently available could be valauble and useful for society and businesses? - open-ended question
11. Which of the data sets currently NOT available in Latvia’s open data portal could, in your opinion, be valauble and useful for society and businesses? - open-ended question
12. How did you define them? - {(1)Subjective opinion; (2) experience with data; (3) filtering out the most popular datasets, i.e. basing the on public opinion; (4) other (open-ended question)}
13. How high could be the value of these data sets value for you or your business? - 5-point Likert scale, where 1 – not valuable, 5 – highly valuable
14. Do you represent any company/ organization (are you working anywhere)? (if “yes”, please, fill out the survey twice, i.e. as an individual user AND a company representative) - {yes; no; I am an individual data user; other (open-ended)}
15. What industry/ sector does your company/ organization belong to? (if you do not work at the moment, please, choose the last option) - {Information and communication services; Financial and ansurance activities; Accommodation and catering services; Education; Real estate operations; Wholesale and retail trade; repair of motor vehicles and motorcycles; transport and storage; construction; water supply; waste water; waste management and recovery; electricity, gas supple, heating and air conditioning; manufacturing industry; mining and quarrying; agriculture, forestry and fisheries professional, scientific and technical services; operation of administrative and service services; public administration and defence; compulsory social insurance; health and social care; art, entertainment and recreation; activities of households as employers;; CSO/NGO; Iam not a representative of any company
16. To which category does your company/ organization belong to in terms of its size? - {small; medium; large; self-employeed; I am not a representative of any company}
17. What is the age group that you belong to? (if you are an individual user, not a company representative) - {11..15, 16..20, 21..25, 26..30, 31..35, 36..40, 41..45, 46+, “do not want to reveal”}
18. Please, indicate your education or a scientific degree that corresponds most to you? (if you are an individual user, not a company representative) - {master degree; bachelor’s degree; Dr. and/ or PhD; student (bachelor level); student (master level); doctoral candidate; pupil; do not want to reveal these data}
***Format of the file***
.xls, .csv (for the first spreadsheet only), .odt
***Licenses or restrictions***
CC-BY
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in Iran was worth 436.91 billion US dollars in 2024, according to official data from the World Bank. The GDP value of Iran represents 0.41 percent of the world economy. This dataset provides - Iran GDP - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for MANUFACTURING PMI reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Cloud Computing Market Growth | Industry Analysis, Size & Forecast Report
Dataset updated: Jun 27, 2024
Dataset authored and provided by: Mordor Intelligence
License: https://www.mordorintelligence.com/privacy-policy
Time period covered: 2019 - 2029
Area covered: Global
Variables measured: CAGR, Market size, Market share analysis, Global trends, Industry forecast
Description: The Cloud Computing Market size is estimated at USD 0.68 trillion in 2024, and is expected to reach USD 1.44 trillion by 2029, growing at a CAGR of 16.40% during the forecast period (2024-2029).
Report Attribute
Study Period | 2019-2029 |
Market Size (2024) | USD 0.68 Trillion |
Market Size (2029) | USD 1.44 Trillion |
CAGR (2024 - 2029) | 16.40% |
Fastest Growing Market | Asia Pacific |
Largest Market | North America |
Quantitative Units: Revenue in USD Billion, Volumes in Units, Pricing in USD
Regions and Countries Covered:
North America | United States, Canada |
Europe | Germany, United Kingdom, Italy, France, Russia, and Rest of Europe |
Asia-Pacific | India, China, Japan, South Korea, and Rest of Asia-Pacific |
Latin America | Brazil, Mexico, Argentina, and Rest of Latin America |
Middle East and Africa | Brazil, Mexico, Argentina, and the Rest of Middle East and Africa |
Industry Segmentation Covered:
By Cloud Computing: IaaS, SaaS, PaaS
By End-User: IT and Telecom, BFSI, Retail and Consumer Goods, Manufacturing, Healthcare, Media and Entertainment
Market Players Covered: Amazon Web Services, Google LLC, Microsoft Corporation, Alibaba Cloud, and Salesforce
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in Nigeria was worth 187.76 billion US dollars in 2024, according to official data from the World Bank. The GDP value of Nigeria represents 0.18 percent of the world economy. This dataset provides the latest reported value for - Nigeria GDP - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Solar Footprints in CaliforniaThis GIS dataset consists of polygons that represent the footprints of solar powered electric generation facilities and related infrastructure in California called Solar Footprints. The location of solar footprints was identified using other existing solar footprint datasets from various sources along with imagery interpretation. CEC staff reviewed footprints identified with imagery and digitized polygons to match the visual extent of each facility. Previous datasets of existing solar footprints used to locate solar facilities include: GIS Layers: (1) California Solar Footprints, (2) UC Berkeley Solar Points, (3) Kruitwagen et al. 2021, (4) BLM Renewable Project Facilities, (5) Quarterly Fuel and Energy Report (QFER)Imagery Datasets: Esri World Imagery, USGS National Agriculture Imagery Program (NAIP), 2020 SENTINEL 2 Satellite Imagery, 2023Solar facilities with large footprints such as parking lot solar, large rooftop solar, and ground solar were included in the solar footprint dataset. Small scale solar (approximately less than 0.5 acre) and residential footprints were not included. No other data was used in the production of these shapes. Definitions for the solar facilities identified via imagery are subjective and described as follows: Rooftop Solar: Solar arrays located on rooftops of large buildings. Parking lot Solar: Solar panels on parking lots roughly larger than 1 acre, or clusters of solar panels in adjacent parking lots. Ground Solar: Solar panels located on ground roughly larger than 1 acre, or large clusters of smaller scale footprints. Once all footprints identified by the above criteria were digitized for all California counties, the features were visually classified into ground, parking and rooftop categories. The features were also classified into rural and urban types using the 42 U.S. Code § 1490 definition for rural. In addition, the distance to the closest substation and the percentile category of this distance (e.g. 0-25th percentile, 25th-50th percentile) was also calculated. The coverage provided by this data set should not be assumed to be a complete accounting of solar footprints in California. Rather, this dataset represents an attempt to improve upon existing solar feature datasets and to update the inventory of "large" solar footprints via imagery, especially in recent years since previous datasets were published. This procedure produced a total solar project footprint of 150,250 acres. Attempts to classify these footprints and isolate the large utility-scale projects from the smaller rooftop solar projects identified in the data set is difficult. The data was gathered based on imagery, and project information that could link multiple adjacent solar footprints under one larger project is not known. However, partitioning all solar footprints that are at least partly outside of the techno-economic exclusions and greater than 7 acres yields a total footprint size of 133,493 acres. These can be approximated as utility-scale footprints. Metadata: (1) CBI Solar FootprintsAbstract: Conservation Biology Institute (CBI) created this dataset of solar footprints in California after it was found that no such dataset was publicly available at the time (Dec 2015-Jan 2016). This dataset is used to help identify where current ground based, mostly utility scale, solar facilities are being constructed and will be used in a larger landscape intactness model to help guide future development of renewable energy projects. The process of digitizing these footprints first began by utilizing an excel file from the California Energy Commission with lat/long coordinates of some of the older and bigger locations. After projecting those points and locating the facilities utilizing NAIP 2014 imagery, the developed area around each facility was digitized. While interpreting imagery, there were some instances where a fenced perimeter was clearly seen and was slightly larger than the actual footprint. For those cases the footprint followed the fenced perimeter since it limits wildlife movement through the area. In other instances, it was clear that the top soil had been scraped of any vegetation, even outside of the primary facility footprint. These footprints included the areas that were scraped within the fencing since, especially in desert systems, it has been near permanently altered. Other sources that guided the search for solar facilities included the Energy Justice Map, developed by the Energy Justice Network which can be found here:https://www.energyjustice.net/map/searchobject.php?gsMapsize=large&giCurrentpageiFacilityid;=1&gsTable;=facility&gsSearchtype;=advancedThe Solar Energy Industries Association’s “Project Location Map” which can be found here: https://www.seia.org/map/majorprojectsmap.phpalso assisted in locating newer facilities along with the "Power Plants" shapefile, updated in December 16th, 2015, downloaded from the U.S. Energy Information Administration located here:https://www.eia.gov/maps/layer_info-m.cfmThere were some facilities that were stumbled upon while searching for others, most of these are smaller scale sites located near farm infrastructure. Other sites were located by contacting counties that had solar developments within the county. Still, others were located by sleuthing around for proposals and company websites that had images of the completed facility. These helped to locate the most recently developed sites and these sites were digitized based on landmarks such as ditches, trees, roads and other permanent structures.Metadata: (2) UC Berkeley Solar PointsUC Berkeley report containing point location for energy facilities across the United States.2022_utility-scale_solar_data_update.xlsm (live.com)Metadata: (3) Kruitwagen et al. 2021Abstract: Photovoltaic (PV) solar energy generating capacity has grown by 41 per cent per year since 2009. Energy system projections that mitigate climate change and aid universal energy access show a nearly ten-fold increase in PV solar energy generating capacity by 2040. Geospatial data describing the energy system are required to manage generation intermittency, mitigate climate change risks, and identify trade-offs with biodiversity, conservation and land protection priorities caused by the land-use and land-cover change necessary for PV deployment. Currently available inventories of solar generating capacity cannot fully address these needs. Here we provide a global inventory of commercial-, industrial- and utility-scale PV installations (that is, PV generating stations in excess of 10 kilowatts nameplate capacity) by using a longitudinal corpus of remote sensing imagery, machine learning and a large cloud computation infrastructure. We locate and verify 68,661 facilities, an increase of 432 per cent (in number of facilities) on previously available asset-level data. With the help of a hand-labelled test set, we estimate global installed generating capacity to be 423 gigawatts (−75/+77 gigawatts) at the end of 2018. Enrichment of our dataset with estimates of facility installation date, historic land-cover classification and proximity to vulnerable areas allows us to show that most of the PV solar energy facilities are sited on cropland, followed by arid lands and grassland. Our inventory could aid PV delivery aligned with the Sustainable Development GoalsEnergy Resource Land Use Planning - Kruitwagen_etal_Nature.pdf - All Documents (sharepoint.com)Metadata: (4) BLM Renewable ProjectTo identify renewable energy approved and pending lease areas on BLM administered lands. To provide information about solar and wind energy applications and completed projects within the State of California for analysis and display internally and externally. This feature class denotes "verified" renewable energy projects at the California State BLM Office, displayed in GIS. The term "Verified" refers to the GIS data being constructed at the California State Office, using the actual application/maps with legal descriptions obtained from the renewable energy company. https://www.blm.gov/wo/st/en/prog/energy/renewable_energy https://www.blm.gov/style/medialib/blm/wo/MINERALS_REALTY_AND_RESOURCE_PROTECTION_/energy/solar_and_wind.Par.70101.File.dat/Public%20Webinar%20Dec%203%202014%20-%20Solar%20and%20Wind%20Regulations.pdfBLM CA Renewable Energy Projects | BLM GBP Hub (arcgis.com)Metadata: (5) Quarterly Fuel and Energy Report (QFER) California Power Plants - Overview (arcgis.com)
The documented dataset covers Enterprise Survey (ES) panel data collected in Niger in 2005, 2009 and 2016, as part of Africa Enterprise Surveys rollout, an initiative of the World Bank. The objective of the survey is to obtain feedback from enterprises on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms.
Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries. Only registered businesses are surveyed in the Enterprise Survey.
Data from 151 establishments was analyzed. Stratified random sampling was used to select the surveyed businesses. The data was collected using face-to-face interviews.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs and labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90 percent of the questions objectively ascertain characteristics of a country’s business environment. The remaining questions assess the survey respondents’ opinions on what are the obstacles to firm growth and performance.
National
The primary sampling unit of the study is an establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural private economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities sectors. Companies with 100% government ownership are not eligible to participate in the Enterprise Surveys.
Sample survey data [ssd]
Three levels of stratification were used in this country: industry, establishment size, and region.
Industry stratification was designed as follows: the universe was stratified as into manufacturing and services industries- Manufacturing (ISIC Rev. 3.1 codes 15 - 37), and Services (ISIC codes 45, 50-52, 55, 60-64, and 72).
For the 2009 sample stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. Size stratification was defined following the standardized definition used for the Enterprise Surveys: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. Regional stratification was defined in terms of the geographic regions with the largest commercial presence in the country: Maradi and Niamey were the two areas selected in Niger.
Two frames were used for Niger. The first one included official lists from the Chamber of commerce, craft and industries of Niger 2008 and the Repertoire of Companies (2008) operating in Niger. The second frame (the panel sample) consisted of enterprises interviewed for the Enterprise Survey in 2005, which were to be re-interviewed where they were in the selected geographical regions and met eligibility criteria. Both database contained the following information: -Name of the firm -Contact details -ISIC code -Number of employees.
Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 39.9% (134 out of 344 establishments). Breaking down by industry, the following numbers of establishments were surveyed: Manufacturing - 52, Services - 98.
For 2017: Regional stratification for the Niger ES was done across two regions: Niamey and Rest of the Country.
The sample frame consisted of listings of firms from three sources: - the list of 150 firms from the Niger 2009 ES for panel firms - firm data from La Caisse Nationale de Sécurité Sociale (CNSS) and a list of exporting firms by the Institut National des Statistiques (INS) for fresh firms (firms not covered in 2009).
Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 18.6% (76 out of 409 establishments).
Face-to-face [f2f]
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect "Refusal to respond" (-8) as a different option from "Don't know" (-9). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🚀 Data Science Careers in 2025: Jobs and Salary Trends in Pakistan 🚀 Data Science is one of the fastest-growing fields, and by 2025, the demand for skilled professionals in Pakistan will only increase. If you’re considering a career in Data Science, here’s what you need to know about the top jobs and salary trends.
🔍 Top Data Science Jobs in 2025 1) Data Scientist Avg Salary: PKR 1.2M - 2.5M/year (Entry-Level), PKR 3M - 6M/year (Experienced) Skills: Python, R, Machine Learning, Data Visualization
2) Data Analyst Avg Salary: PKR 800K - 1.5M/year (Entry-Level), PKR 2M - 3.5M/year (Experienced) Skills: SQL, Excel, Tableau, Power BI
3) Machine Learning Engineer Avg Salary: PKR 1.5M - 3M/year (Entry-Level), PKR 4M - 7M/year (Experienced) Skills: TensorFlow, PyTorch, Deep Learning, NLP
4)Business Intelligence Analyst Avg Salary: PKR 1M - 2M/year (Entry-Level), PKR 2.5M - 4M/year (Experienced) Skills: Data Warehousing, ETL, Dashboarding
5) AI Research Scientist Avg Salary: PKR 2M - 4M/year (Entry-Level), PKR 5M - 10M/year (Experienced) Skills: AI Algorithms, Research, Advanced Mathematic
💡 Why Choose Data Science? High Demand: Every industry in Pakistan needs data professionals. Attractive Salaries: Competitive pay based on technical expertise. Growth Opportunities: Unlimited career growth in this field.
📈 Salary Trends Entry-Level: PKR 800K - 1.5M/year Mid-Level: PKR 2M - 4M/year Senior-Level: PKR 5M+ (depending on expertise and industry)
🛠️ How to Get Started? Learn Skills: Focus on Python, SQL, Machine Learning, and Data Visualization. Build Projects: Work on real-world datasets to create a strong portfolio. Network: Connect with industry professionals and join Data Science communities.
work_year: The year in which the data was recorded. This field indicates the temporal context of the data, important for understanding salary trends over time.
job_title: The specific title of the job role, like 'Data Scientist', 'Data Engineer', or 'Data Analyst'. This column is crucial for understanding the salary distribution across various specialized roles within the data field.
job_category: A classification of the job role into broader categories for easier analysis. This might include areas like 'Data Analysis', 'Machine Learning', 'Data Engineering', etc.
salary_currency: The currency in which the salary is paid, such as USD, EUR, etc. This is important for currency conversion and understanding the actual value of the salary in a global context.
salary: The annual gross salary of the role in the local currency. This raw salary figure is key for direct regional salary comparisons.
salary_in_usd: The annual gross salary converted to United States Dollars (USD). This uniform currency conversion aids in global salary comparisons and analyses.
employee_residence: The country of residence of the employee. This data point can be used to explore geographical salary differences and cost-of-living variations.
experience_level: Classifies the professional experience level of the employee. Common categories might include 'Entry-level', 'Mid-level', 'Senior', and 'Executive', providing insight into how experience influences salary in data-related roles.
employment_type: Specifies the type of employment, such as 'Full-time', 'Part-time', 'Contract', etc. This helps in analyzing how different employment arrangements affect salary structures.
work_setting: The work setting or environment, like 'Remote', 'In-person', or 'Hybrid'. This column reflects the impact of work settings on salary levels in the data industry.
company_location: The country where the company is located. It helps in analyzing how the location of the company affects salary structures.
company_size: The size of the employer company, often categorized into small (S), medium (M), and large (L) sizes. This allows for analysis of how company size influences salary.
Amazon AWS - Cloud Platforms & Services
Companies using Amazon AWS
We have data on 1,070,574 companies that use Amazon AWS. The companies using Amazon AWS are most often found in United States and in the Computer Software industry. Amazon AWS is most often used by companies with 10-50 employees and 1M-10M dollars in revenue. Our data for Amazon AWS usage goes back as far as 2 years and 1 months.
What is Amazon AWS?
Amazon Web Services (AWS) is a collection of remote computing services, also called web services that make up a cloud computing platform offered by Amazon.com.
Top Industries that use Amazon AWS
Looking at Amazon AWS customers by industry, we find that Computer Software (6%) is the largest segment.
Distribution of companies using Amazon AWS by Industry
Computer software - 67, 537 companies Hospitals & Healthcare - 54, 293 companies Retail - 39, 543 companies Information Technology and Services - 35, 382 companies Real Estate - 31, 676 companies Restaurants - 30, 302 companies Construction - 29, 207 companies Automotive - 28, 469 companies Financial Services - 23, 680 companies Education Management - 21, 548 companies
Top Countries that use Amazon AWS
49% of Amazon AWS customers are in United States and 7% are in United Kingdom.
Distribution of companies using Amazon AWS by country
United Sates – 616 2275 companies United Kingdom – 68 219 companies Australia – 44 601 companies Canada – 42 770 companies Germany – 31 541 companies India – 30 949 companies Netherlands – 19 543 companies Brazil – 17 165 companies Italy – 14 876 companies Spain – 14 675 companies
Contact Information of Fields Include:-
• Company Name
• Business contact number
• Title
• Name
• Email Address
• Country, State, City, Zip Code
• Phone, Mobile and Fax
• Website
• Industry
• SIC & NAICS Code
• Employees Size
• Revenue Size
• And more…
Why Buy AWS Users List from DataCaptive?
• More than 1,070,574 companies
• Responsive database
• Customizable as per your requirements
• Email and Tele-verified list
• Team of 100+ market researchers
• Authentic data sources
What’s in for you?
Over choosing us, here are a few advantages we authenticate-
• Locate, target, and prospect leads from 170+ countries • Design and execute ABM and multi-channel campaigns • Seamless and smooth pre-and post-sale customer service • Connect with old leads and build a fruitful customer relationship • Analyze the market for product development and sales campaigns • Boost sales and ROI with increased customer acquisition and retention
Our security compliance
We use of globally recognized data laws like –
GDPR, CCPA, ACMA, EDPS, CAN-SPAM and ANTI CAN-SPAM to ensure the privacy and security of our database. We engage certified auditors to validate our security and privacy by providing us with certificates to represent our security compliance.
Our USPs- what makes us your ideal choice?
At DataCaptive™, we strive consistently to improve our services and cater to the needs of businesses around the world while keeping up with industry trends.
• Elaborate data mining from credible sources • 7-tier verification, including manual quality check • Strict adherence to global and local data policies • Guaranteed 95% accuracy or cash-back • Free sample database available on request
Guaranteed benefits of our Amazon AWS users email database!
85% email deliverability and 95% accuracy on other data fields
We understand the importance of data accuracy and employ every avenue to keep our database fresh and updated. We execute a multi-step QC process backed by our Patented AI and Machine learning tools to prevent anomalies in consistency and data precision. This cycle repeats every 45 days. Although maintaining 100% accuracy is quite impractical, since data such as email, physical addresses, and phone numbers are subjected to change, we guarantee 85% email deliverability and 95% accuracy on other data points.
100% replacement in case of hard bounces
Every data point is meticulously verified and then re-verified to ensure you get the best. Data Accuracy is paramount in successfully penetrating a new market or working within a familiar one. We are committed to precision. However, in an unlikely event where hard bounces or inaccuracies exceed the guaranteed percentage, we offer replacement with immediate effect. If need be, we even offer credits and/or refunds for inaccurate contacts.
Other promised benefits
• Contacts are for the perpetual usage • The database comprises consent-based opt-in contacts only • The list is free of duplicate contacts and generic emails • Round-the-clock customer service assistance • 360-degree database solutions
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Top 10 sectors of weighted clustering coefficient in GIPCN.
🌍 Global B2B Leads Data | 170M Emails + 100M Mobile Numbers | 95% Accuracy | API & Bi-Weekly Updates Fuel your sales pipeline with the world’s largest, most accurate B2B contact database—verified, actionable, and refreshed every two weeks.
The Forager.ai Global B2B Leads Dataset delivers 170M+ verified emails and 100M+ mobile numbers, all validated for 95%+ accuracy and updated bi-weekly. Ideal for cold outreach, CRM enrichment, and hyper-targeted campaigns, this dataset covers decision-makers across industries, company sizes, and geographies.
📊 Key Features ✅ 270M+ Total Contacts – One of the largest B2B leads database available. ✅ 95% Accuracy Guarantee – AI-validated emails & mobile numbers. ✅ Bi-Weekly Updates – Fresh data to reduce bounce rates. ✅ Global Coverage – North America, Europe, APAC & emerging markets.
📋 Core Data Fields: ✔ Professional/personal Emails (170M+) ✔ Mobile Numbers (100M+) – Direct lines for higher response rates ✔ Full Name, Job Title, Seniority Level ✔ Company Name, Industry, Revenue, Employee Size ✔ Location (Country, City, LinkedIn URL)
🎯 Top Use Cases 🔹 High-Volume Cold Outreach
Launch email/SMS campaigns with verified contacts.
Reduce bounce rates with 95% accurate data.
🔹 CRM & Prospecting Tools
Enrich Salesforce, HubSpot, or Outreach.io instantly.
Build targeted lead lists using firmographics.
🔹 ABM & Intent Data
Layer contacts with technographics for precision targeting.
Track account movements and job changes.
🔹 Recruitment & Partnerships
Source executive/candidates contacts profiles.
Map organizational hierarchies.
⚡ Delivery & Integration REST API – Real-time access for sales tools.
CSV/JSON Files – Bulk delivery via S3, Wasabi, or Snowflake.
Custom Feeds – Managed database solutions.
🔒 Data Quality & Compliance GDPR-Compliant – Ethically sourced, legally compliant.
Suppression Lists – Auto-remove opt-outs and hard bounces.
🚀 Why Forager.ai? ✔ Highest Accuracy (95%) – Industry-leading verification. ✔ Built for Sales Teams – Optimized for cold email/SMS performance. ✔ Enterprise-Grade Freshness – Bi-weekly updates = fewer dead leads. ✔ Dedicated Support – SLA-backed onboarding & troubleshooting.
Tags: B2B Leads | Personal / Work Email Database | Mobile Numbers | Sales Prospecting | CRM Enrichment | Cold Outreach | 95% Accuracy | API Integration
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Application and use cases
1 )Market Analysis: Evaluate overall trends and regional variations in car sales to assess manufacturer performance, model preferences, and demographic insights. 2) Seasonal Patterns and Competitor Analysis: Investigate seasonal and cyclical patterns in sales. 3) Forecasting and Predictive Analysis Use historical data for forecasting and predict future market trends. Support marketing, advertising, and investment decisions based on insights. 4) Supply Chain and Inventory Optimization: Provide valuable data for stakeholders in the automotive industry.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Data from Fortune 500's 2023 ranking.
Includes data on top 1000 companies w/ additional info (Stock symbol/*ticker*, CEO name).
Update (New dataset): 2024 Fortune 1000 Companies
From Investopedia:
The Fortune 1000 is an annual list of the 1000 largest American companies maintained by the popular magazine Fortune Fortune ranks the eligible companies by revenue generated from core operations, discounted operations, and consolidated subsidiaries Since revenue is the basis for inclusion, every company is authorized to operate in the United States and files a 10-K or comparable financial statement with a government agency -- .
Fortune magazine publishes this list every year and some lists can be found from different sources. From looking at this year's available datasets, some features were missing or could not be found. This was built from scraping the standard features as well as what's included on Company Info (such as CEO, Ticker and website) from the Fortune magazine website. Details on how the data was generated can be found on this notebook where a few of the features were also visualized.
The source code from the 2023 fortune 500 Ranking includes 1000 companies. A reference page (slug) to additional info is included for each companies which were also scrapped to complete the dataset.
Available formats: csv, parquet
Features are follows:
[Note: References to datatypes are relevant when using the parquet file; Labels refer to the original website names]