This is a dataset I found online through the Google Dataset Search portal.
The American Community Survey (ACS) 2009-2013 multi-year data are used to list all languages spoken in the United States that were reported during the sample period. These tables provide detailed counts of many more languages than the 39 languages and language groups that are published annually as a part of the routine ACS data release. This is the second tabulation beyond 39 languages since ACS began.
The tables include all languages that were reported in each geography during the 2009 to 2013 sampling period. For the purpose of tabulation, reported languages are classified in one of 380 possible languages or language groups. Because the data are a sample of the total population, there may be languages spoken that are not reported, either because the ACS did not sample the households where those languages are spoken, or because the person filling out the survey did not report the language or reported another language instead.
The tables also provide information about self-reported English-speaking ability. Respondents who reported speaking a language other than English were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all." The data on ability to speak English represent the person’s own perception about his or her own ability or, because ACS questionnaires are usually completed by one household member, the responses may represent the perception of another household member.
These tables are also available through the Census Bureau's application programming interface (API). Please see the developers page for additional details on how to use the API to access these data.
Sources:
Google Dataset Search: https://toolbox.google.com/datasetsearch
2009-2013 American Community Survey
Original dataset: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html
Downloaded From: https://data.world/kvaughn/languages-county
Banner and thumbnail photo by Farzad Mohsenvand on Unsplash
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset offers a detailed collection of US-GAAP financial data extracted from the financial statements of exchange-listed U.S. companies, as submitted to the U.S. Securities and Exchange Commission (SEC) via the EDGAR database. Covering filings from January 2009 onwards, this dataset provides key financial figures reported by companies in accordance with U.S. Generally Accepted Accounting Principles (GAAP).
This dataset primarily relies on the SEC's Financial Statement Data Sets and EDGAR APIs: - SEC Financial Statement Data Sets - EDGAR Application Programming Interfaces
In instances where specific figures were missing from these sources, data was directly extracted from the companies' financial statements to ensure completeness.
Please note that the dataset presents financial figures exactly as reported by the companies, which may occasionally include errors. A common issue involves incorrect reporting of scaling factors in the XBRL format. XBRL supports two tag attributes related to scaling: 'decimals' and 'scale.' The 'decimals' attribute indicates the number of significant decimal places but does not affect the actual value of the figure, while the 'scale' attribute adjusts the value by a specific factor.
However, there are several instances, numbering in the thousands, where companies have incorrectly used the 'decimals' attribute (e.g., 'decimals="-6"') under the mistaken assumption that it controls scaling. This is not correct, and as a result, some figures may be inaccurately scaled. This dataset does not attempt to detect or correct such errors; it aims to reflect the data precisely as reported by the companies. A future version of the dataset may be introduced to address and correct these issues.
The source code for data extraction is available here
Since Investing.com does not have an API, I decided to develop this Python package in order to retrieve historical data from the companies that integrate the Continuous Spanish Stock Market. So on, I decided to generate, via investpy, the datasets for every company so that any Data Scientist or Data Enthusiastic can handle it and abstract their own conclusions and research.
The main purpose of developing investpy, the package from which these datasets have been retrieved, was to use it as the Data Extraction tool for its namesake section, for my Final Degree Project at the University of Salamanca titled "*Machine Learning for stock investment recommendation systems*". The package end up being so consistent, reliable and usable that it is going to be used as the main Data Extraction tool by another students in their Final Degree Projects named "*Recommender system of banking products*" and "*Robo-Advisor Application*".
investpy, the Python package from which datasets were generated is currently in a development beta version, so please, if needed open an issue to solve all the possible problems the package may be causing or any dataset error. Also, any new ideas or proposals are welcome, and will be gladly implemented in the package if the are positive and useful.
For further information or any question feel free to contact me via email at alvarob96@usal.es
You can also check my Medium Publication, where I upload weekly posts related to Data Science and mainly on Data Extraction techniques via Web Scraping. In this case, you can read "investpy — a Python package for historical data extraction from the Spanish stock market" where I explain the basics on investpy development and some insights on Web Scraping with Python.
This Python Package has been made for research purposes in order to fit a needs that Investing.com does not cover, so this package works like an Application Programming Interface (API) of Investing.com developed in an altruistic way. Conclude that this package is not related in any way with Investing.com or any dependant company, the only requirement for developing this package was to mention the source where data is retrieved.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://upload.wikimedia.org/wikipedia/en/thumb/a/a4/NVIDIA_logo.svg/731px-NVIDIA_logo.svg.png" alt="NVidia">
Nvidia Corporation is an American multinational technology company incorporated in Delaware and based in Santa Clara, California.
It designs graphics processing units (GPUs) for the gaming and professional markets, as well as system on a chip units (SoCs) for the mobile computing and automotive market.
Its primary GPU line, labeled "GeForce", is in direct competition with the GPUs of the "Radeon" brand by Advanced Micro Devices (AMD). Nvidia expanded its presence in the gaming industry with its handheld game consoles Shield Portable, Shield Tablet, and Shield Android TV and its cloud gaming service GeForce Now.
Its professional line of GPUs are used in workstations for applications in such fields as architecture, engineering and construction, media and entertainment, automotive, scientific research, and manufacturing design.
In addition to GPU manufacturing, Nvidia provides an application programming interface (API) called CUDA that allows the creation of massively parallel programs which utilize GPUs.They are deployed in supercomputing sites around the world. More recently, it has moved into the mobile computing market, where it produces Tegra mobile processors for smartphones and tablets as well as vehicle navigation and entertainment systems.It recently acquired ARM
# Let us analyze the performance of this solid star!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 200K android malware apps which are labeled and characterized into corresponding family. Benign android apps (200K) are collected from Androzoo dataset to balance the huge dataset. We collected 14 malware categories including adware, backdoor, file infector, no category, Potentially Unwanted Apps (PUA), ransomware, riskware, scareware, trojan, trojan-banker, trojan-dropper, trojan-sms,**trojan-spy** and zero-day.
A complete taxonomy of all the malware families of captured malware apps is created by dividing them into 8 categories such as sensitive data collection, media, hardware, actions/activities, internet connection, C&C, antivirus and storage & settings.
Category | Number of families | Number of samples |
---|---|---|
Adware | 48 | 47,210 |
Backdoor | 11 | 1,538 |
File Infector | 5 | 669 |
No Category | - | 2,296 |
PUA | 8 | 2,051 |
Ransomware | 8 | 6,202 |
Riskware | 21 | 97,349 |
Scareware | 3 | 1,556 |
Trojan | 45 | 13,559 |
Trojan-Banker | 11 | 887 |
Trojan-Dropper | 9 | 2,302 |
Trojan-SMS | 11 | 3,125 |
Trojan-Spy | 11 | 3,540 |
Zero-day | - | 13,340 |
AndroidManifest.xml contains a lot of features that can be used for static analysis. The main extracted features include:
Feature | Values |
---|---|
Package Name | "com.fb.iwidget" |
Activities | "com.fb.iwidget.OverlayActivity" "org.acra.CrashReportDialog" "com.batch.android.BatchActionActivity" "com.fb.iwidget.MainActivity" "com.fb.iwidget.PreferencesActivity" "com.fb.iwidget.PickerActivity" "com.fb.iwidget.IntroActivity" |
Services | "com.batch.android.BatchActionService" "com.fb.iwidget.MainService" "com.fb.iwidget.SnapAccessService" |
Receivers/Providers | "com.fb.iwidget.ExpandWidgetProvider" "com.fb.iwidget.ActionReceiver" |
Intents Actions | "android.accessibilityservice.AccessibilityService" "android.appwidget.action.APPWIDGET_UPDATE" "android.intent.action.BOOT_COMPLETED" "android.intent.action.CREATE_SHORTCUT" "android.intent.action.MAIN" "android.intent.action.MY_PACKAGE_REPLACED" "android.intent.action.USER_PRESENT" "android.intent.action.VIEW" "com.fb.iwidget.action.SHOULD_REVIVE" |
Intents Categories | "android.intent.category.BROWSABLE" "android.intent.category.DEFAULT" "android.intent.category.LAUNCHER" |
Permissions | "android.permission.ACCESS_NETWORK_STATE" "android.permission.CALL_PHONE" "android.permission.INTERNET" "android.permission.RECEIVE_BOOT_COMPLETED" "android.permission.SYSTEM_ALERT_WINDOW" "com.android.vending.BILLING" "android.permission.BIND_ACCESSIBILITY_SERVICE" |
Meta-Data | "android.accessibilityservice" "android.appwidget.provider" |
#Icons | 331 |
#Pictures | 0 |
#Videos | 0 |
Audio files | 0 |
Videos | 0 |
Size of the App | 4.2M |
For understanding the behavioral changes of these malware categories and families, six categories of features are extracted after executing the malware in an emulated environment. The main extracted features include:
Git LFS Details
Origin: https://huggingface.co/TheBloke/LLaMa-7B-GGML SHA256: bcb95f6755597f26046ab2d5ebea51bf1418f440a96e1563f0fecc379c2cbee3 Pointer size: 135 Bytes Size of remote file: 3.79 GB
Git Large File Storage (LFS) replaces large files with text pointers inside Git, while storing the file contents on a remote server. More info.
Meta's LLaMA 7b GGML These files are GGML format model files for Meta's LLaMA 7b.
GGML files are for CPU + GPU inference using llama.cpp and libraries and UIs which support this format, such as:
KoboldCpp, a powerful GGML web UI with full GPU acceleration out of the box. Especially good for story telling. LoLLMS Web UI, a great web UI with GPU acceleration via the c_transformers backend. LM Studio, a fully featured local GUI. Supports full GPU accel on macOS. Also supports Windows, without GPU accel. text-generation-webui, the most popular web UI. Requires extra steps to enable GPU accel via llama.cpp backend. ctransformers, a Python library with LangChain support and OpenAI-compatible AI server. llama-cpp-python, a Python library with OpenAI-compatible API server. These files were quantised using hardware kindly provided by Latitude.sh.
Repositories available GPTQ models for GPU inference, with multiple quantisation parameter options. 2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference "https://huggingface.co/huggyllama/llama-7b">Unquantised fp16 model in pytorch format, for GPU inference and for further conversions Prompt template: None
Compatibility Original llama.cpp quant methods: q4_0, q4_1, q5_0, q5_1, q8_0 These are guaranteed to be compatible with any UIs, tools and libraries released since late May. They may be phased out soon, as they are largely superseded by the new k-quant methods.
New k-quant methods: q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K These new quantisation methods are compatible with llama.cpp as of June 6th, commit 2d43387.
They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This is a dataset I found online through the Google Dataset Search portal.
The American Community Survey (ACS) 2009-2013 multi-year data are used to list all languages spoken in the United States that were reported during the sample period. These tables provide detailed counts of many more languages than the 39 languages and language groups that are published annually as a part of the routine ACS data release. This is the second tabulation beyond 39 languages since ACS began.
The tables include all languages that were reported in each geography during the 2009 to 2013 sampling period. For the purpose of tabulation, reported languages are classified in one of 380 possible languages or language groups. Because the data are a sample of the total population, there may be languages spoken that are not reported, either because the ACS did not sample the households where those languages are spoken, or because the person filling out the survey did not report the language or reported another language instead.
The tables also provide information about self-reported English-speaking ability. Respondents who reported speaking a language other than English were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all." The data on ability to speak English represent the person’s own perception about his or her own ability or, because ACS questionnaires are usually completed by one household member, the responses may represent the perception of another household member.
These tables are also available through the Census Bureau's application programming interface (API). Please see the developers page for additional details on how to use the API to access these data.
Sources:
Google Dataset Search: https://toolbox.google.com/datasetsearch
2009-2013 American Community Survey
Original dataset: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html
Downloaded From: https://data.world/kvaughn/languages-county
Banner and thumbnail photo by Farzad Mohsenvand on Unsplash