https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450261https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450261
Abstract (en): Dun's Review began publishing monthly data on business failures by branch of business during the 1890s. At that time, a business failure was defined as a concern which was involved in a court proceeding or voluntary action which was likely to end in loss to creditors. Liabilities of failed businesses were defined "as all liabilities except long-term publicly-held obligations, chiefly bonds." Dun's published data on failures by branch of business from 1895 through 1935. This dataset reconstructs that series and links it to its successors. The successor series include data on business failures by division of industry, which Dun and Bradstreet's published from 1934 through 1940. This study includes six parts. Part One contains aggregate liabilities in dollars, broken down by branch, month, and year. Part Two contains aggregate numbers of business failures broken down by branch, month, and year. Part Three contains aggregate liability in dollars broken down by division, month, and year. Part Four contains aggregate numbers of business failures broken down by division, month, and year. Part Five contains aggregate liabilities broken down by sector, month, and year. Part Six contains aggregate numbers of business failures broken down by sector, month, and year. Part One and Part Two contain 36 variables and 562 cases. Part Three and Part Four contain 51 variables and 60 cases. Part Five and Part Six contain 6 variables and 562 cases. This study allows for economic analysis of business failures. It is intended to provide a resource on business failure and liabilites from 1895 to 1940. Data originally collected from court filings at municipal, county, state, and United States district court houses throughout the United States from 1895 through 1940. Data published periodically by R. G. Dun and Company, Bradstreet's Company, and their successors through 1940. From their publications, the principal investigators collected, cleaned, compiled, and computerized the current data series. Variables include monthly, unadjusted, liabilities and monthly, unadjusted, number of failures for different branches, sectors, divisons. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Businesses that failed in the United States from 1895 through 1940. Smallest Geographic Unit: United States The data consist of the aggregate number of corporations filing for bankruptcy in various industries each month in the United States and the total liabilities of those corporations. Please refer to the codebook for sampling information in the "Original P.I. Documentation" section. Additional information can be found by visiting the National Bureau of Economic Research (NBER) Web site. For additional information on these datasets please see the National Bureau of Economic Research (NBER) Web site.The dates in the Original P.I. Documentation for Business Failures by Industry in the United States range from 1895 to 1939, however, the data range from 1895 to 1940. The title for ICPSR 34016 has been changed to reflect the data.
The number of small and medium-sized enterprises in the United States was forecast to continuously decrease between 2024 and 2029 by in total 6.7 thousand enterprises (-2.24 percent). After the fourteenth consecutive decreasing year, the number is estimated to reach 291.94 thousand enterprises and therefore a new minimum in 2029. According to the OECD an enterprise is defined as the smallest combination of legal units, which is an organisational unit producing services or goods, that benefits from a degree of autonomy with regards to the allocation of resources and decision making. Shown here are small and medium-sized enterprises, which are defined as companies with 1-249 employees.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in more than 150 countries and regions worldwide. All input data are sourced from international institutions, national statistical offices, and trade associations. All data has been are processed to generate comparable datasets (see supplementary notes under details for more information).
The Los Angeles BusinessSource Centers provide startup ventures and current small business owners various cost effective tools to make their business a success. Through these tools, small businesses can grow and remain competitive within the City of Los Angeles. Startups focuses on owners of businesses with five (5) or fewer employees, one of whom owns the enterprise, and have net operating income of less than Two Hundred Thousand Dollars ($200,000). This focus is particularly important as the majority of the businesses within the City may be categorized as “survivors,” and historically, many such businesses fail in their first two years of operation. The survival and growth of such businesses is still very important to the ongoing economic vitality of the City.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Registered business locations in San Francisco maintained by the Office of Treasurer-Tax Collector, including business locations that have been sold, closed, or moved out of San Francisco. Each registered business can have one or many locations. Each record represents a single location.
This dataset updates weekly. It is scheduled for Tuesdays but may fail and retry the next day.
The data is a daily time-series of print media coverage of three publicly listed corporations which have specialised to some extent in competing for government contracts. The three companies are Capita PLC, Serco PLC and G4S PLC. The data covers the period from January 1 2005 to June 30 2015. Each entry is an article with information on date, source, number of company mentions, and a human coded measure of article sentiment. Public Service Corporations (PSCs) such as Serco, Capita and G4S are large businesses that are relatively specialised in the delivery of public services. They have been subject to very little direct analysis by policy scholars and this project starts to fill that gap. Over recent years there have been a number of contract failures for example relating to the Olympic security contract, GP out of hours services, allegations of abuse of inmates in immigration detention centres and overcharging on contracts for prisoner tagging. Such failures damage companies’ reputations and may hurt their hopes of further public contracts. This project will analyse the effect on PSC share prices of public scrutiny of failures and controversies. Responsiveness of PSC share price to public scrutiny may constitute a new type of political accountability, giving the corporations incentive to improve performance, or at least avoid public scandal. The effect of public scrutiny on share price may also shape the strategic relationship between Government and its large contractors. Newspaper articles were downloaded from the Nexis database.The search terms specified two or more mentions of the company or the company name featured in the article title. For G4S, the terms Securicor and Group 4 were also searched because the start of the research period coincides with their merger. A subset of publications were selected for coding and data was entered directly into a spread sheet. Duplicate articles were excluded, as were sponsored articles, and a large number of false positives for the Capita search term and the Group 4 term. We also excluded articles related to security van robberies (mostly associated with the Securicor search term).
Title: SECOM Data Set
Abstract: Data from a semi-conductor manufacturing process
Data Set Characteristics: Multivariate Number of Instances: 1567 Area: Computer Attribute Characteristics: Real Number of Attributes: 591 Date Donated: 2008-11-19 Associated Tasks: Classification, Causal-Discovery Missing Values? Yes
Source:
Authors: Michael McCann, Adrian Johnston
Data Set Information:
A complex modern semi-conductor manufacturing process is normally under consistent surveillance via the monitoring of signals/variables collected from sensors and or process measurement points. However, not all of these signals are equally valuable in a specific monitoring system. The measured signals contain a combination of useful information, irrelevant information as well as noise. It is often the case that useful information is buried in the latter two. Engineers typically have a much larger number of signals than are actually required. If we consider each type of signal as a feature, then feature selection may be applied to identify the most relevant signals. The Process Engineers may then use these signals to determine key factors contributing to yield excursions downstream in the process. This will enable an increase in process throughput, decreased time to learning and reduce the per unit production costs.
To enhance current business improvement techniques the application of feature selection as an intelligent systems technique is being investigated.
The dataset presented in this case represents a selection of such features where each example represents a single production entity with associated measured features and the labels represent a simple pass/fail yield for in house line testing, figure 2, and associated date time stamp. Where .1 corresponds to a pass and 1 corresponds to a fail and the data time stamp is for that specific test point.
Using feature selection techniques it is desired to rank features according to their impact on the overall yield for the product, causal relationships may also be considered with a view to identifying the key features.
Results may be submitted in terms of feature relevance for predictability using error rates as our evaluation metrics. It is suggested that cross validation be applied to generate these results. Some baseline results are shown below for basic feature selection techniques using a simple kernel ridge classifier and 10 fold cross validation.
Baseline Results: Pre-processing objects were applied to the dataset simply to standardize the data and remove the constant features and then a number of different feature selection objects selecting 40 highest ranked features were applied with a simple classifier to achieve some initial results. 10 fold cross validation was used and the balanced error rate (*BER) generated as our initial performance metric to help investigate this dataset.
SECOM Dataset: 1567 examples 591 features, 104 fails
FSmethod (40 features) BER % True + % True - % S2N (signal to noise) 34.5 +-2.6 57.8 +-5.3 73.1 +2.1 Ttest 33.7 +-2.1 59.6 +-4.7 73.0 +-1.8 Relief 40.1 +-2.8 48.3 +-5.9 71.6 +-3.2 Pearson 34.1 +-2.0 57.4 +-4.3 74.4 +-4.9 Ftest 33.5 +-2.2 59.1 +-4.8 73.8 +-1.8 Gram Schmidt 35.6 +-2.4 51.2 +-11.8 77.5 +-2.3
Attribute Information:
Key facts: Data Structure: The data consists of 2 files the dataset file SECOM consisting of 1567 examples each with 591 features a 1567 x 591 matrix and a labels file containing the classifications and date time stamp for each example.
As with any real life data situations this data contains null values varying in intensity depending on the individuals features. This needs to be taken into consideration when investigating the data either through pre-processing or within the technique applied.
The data is represented in a raw text file each line representing an individual example and the features seperated by spaces. The null values are represented by the 'NaN' value as per MatLab.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As the fashion e-commerce markets rapidly develop, tens of thousands of products are registered daily on e-commerce platforms. Individual sellers register products after setting up a product category directly on a fashion e-commerce platform. However, many sellers fail to find a suitable category and mistakenly register their products under incorrect ones. Precise category matching is important for increasing sales through search optimization and accurate product exposure. However, manually correcting registered categories is time-consuming and costly for platform managers. To resolve this problem, this study proposes a methodology for fashion e-commerce product classification based on multi-modal deep learning and transfer learning. Through the proposed methodology, three challenges in classifying fashion e-commerce products are addressed. First, the issue of extremely biased e-commerce data is addressed through under-sampling. Second, multi-modal deep learning enables the model to simultaneously use input data in different formats, which helps mitigate the impact of noisy and low-quality e-commerce data by providing richer information.Finally, the high computational cost and long training times involved in training deep learning models with both image and text data are mitigated by leveraging transfer learning. In this study, three strategies for transfer learning to fine-tune the image and text modules are presented. In addition, five methods for fusing feature vectors extracted from a single modal into one and six strategies for fine-tuning multi-modal models are presented, featuring a total of 14 strategies. The study shows that multi-modal models outperform unimodal models based solely on text or image. It also suggests the optimal conditions for classifying e-commerce products, helping fashion e-commerce practitioners construct models tailored to their respective business environments more efficiently.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study is about what matters: predicting when microfinance institutions might fail, especially in places where financial stability is closely linked to economic inclusion. The challenge? Creating something practical and usable. The Adjusted Gross Granular Model (ARGM) model comes here. It combines clever techniques, such as granular computing and machine learning, to handle messy and imbalanced data, ensuring that the model is not just a theoretical concept but a practical tool that can be used in the real world.Data from 56 financial institutions in Peru was analyzed over almost a decade (2014–2023). The results were quite promising. The model detected risks with nearly 90% accuracy in detecting failures and was right more than 95% of the time in identifying safe institutions. But what does this mean in practice? It was tested and flagged six institutions (20% of the total) as high risk. This tool’s impact on emerging markets would be very significant. Financial regulators could act in advance with this model, potentially preventing financial disasters. This is not just a theoretical exercise but a practical solution to a pressing problem in these markets, where every failure has domino effects on small businesses and clients in local communities, who may see their life savings affected and lost due to the failure of these institutions. Ultimately, this research is not just about a machine learning model or using statistics to evaluate results. It is about giving regulators and supervisors of financial institutions a tool they can rely on to help them take action before it is too late when microfinance institutions get into bad financial shape and to make immediate decisions in the event of a possible collapse.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450261https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450261
Abstract (en): Dun's Review began publishing monthly data on business failures by branch of business during the 1890s. At that time, a business failure was defined as a concern which was involved in a court proceeding or voluntary action which was likely to end in loss to creditors. Liabilities of failed businesses were defined "as all liabilities except long-term publicly-held obligations, chiefly bonds." Dun's published data on failures by branch of business from 1895 through 1935. This dataset reconstructs that series and links it to its successors. The successor series include data on business failures by division of industry, which Dun and Bradstreet's published from 1934 through 1940. This study includes six parts. Part One contains aggregate liabilities in dollars, broken down by branch, month, and year. Part Two contains aggregate numbers of business failures broken down by branch, month, and year. Part Three contains aggregate liability in dollars broken down by division, month, and year. Part Four contains aggregate numbers of business failures broken down by division, month, and year. Part Five contains aggregate liabilities broken down by sector, month, and year. Part Six contains aggregate numbers of business failures broken down by sector, month, and year. Part One and Part Two contain 36 variables and 562 cases. Part Three and Part Four contain 51 variables and 60 cases. Part Five and Part Six contain 6 variables and 562 cases. This study allows for economic analysis of business failures. It is intended to provide a resource on business failure and liabilites from 1895 to 1940. Data originally collected from court filings at municipal, county, state, and United States district court houses throughout the United States from 1895 through 1940. Data published periodically by R. G. Dun and Company, Bradstreet's Company, and their successors through 1940. From their publications, the principal investigators collected, cleaned, compiled, and computerized the current data series. Variables include monthly, unadjusted, liabilities and monthly, unadjusted, number of failures for different branches, sectors, divisons. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Businesses that failed in the United States from 1895 through 1940. Smallest Geographic Unit: United States The data consist of the aggregate number of corporations filing for bankruptcy in various industries each month in the United States and the total liabilities of those corporations. Please refer to the codebook for sampling information in the "Original P.I. Documentation" section. Additional information can be found by visiting the National Bureau of Economic Research (NBER) Web site. For additional information on these datasets please see the National Bureau of Economic Research (NBER) Web site.The dates in the Original P.I. Documentation for Business Failures by Industry in the United States range from 1895 to 1939, however, the data range from 1895 to 1940. The title for ICPSR 34016 has been changed to reflect the data.