Big Data Science

Channel address:

Categories: Technologies

Language: English

Subscribers: 1.44K

Description from channel

Big Data Science channel gathers together all interesting facts about Data Science.
For cooperation: a.chernobrovov@gmail.com
💼 — https://t.me/bds_job — channel about Data Science jobs and career
💻 — https://t.me/bdscience_ru — Big Data Science [RU]

▲ Vote (1)

Ratings & Reviews

1.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 7

2022-04-04 07:11:06 Noise Reduction in Quantum Computing: An MIT Study
Quantum computers are very sensitive to noise interference caused by imperfect control signals, environmental disturbances, and unwanted interactions between qubits. Therefore, researchers at MIT have created QuantumNAS, a framework that can identify the most robust quantum circuit for a particular computational problem and generate a mapping pattern tailored to the target quantum processor's qubits. device. QuantumNAS is much less computationally intensive than other search methods and can identify quantum circuits that improve the accuracy of machine learning and quantum chemistry problems. In classical neural networks, including more parameters often improves model accuracy. But in variational quantum computing, more parameters require more quantum gates, which introduces more noise.
To do this, a super-circuit was first designed with all possible parameterized quantum elements in the design space. This circuit was then trained and used to search for circuit architectures with high noise tolerance. The process includes a simultaneous search for quantum circuits and qubit mappings using an evolutionary search algorithm. This algorithm generates several candidates for displaying quantum circuits and qubits, and then evaluates their accuracy using a noise model or on a real machine. The results are fed back into the algorithm, which chooses the most efficient parts and uses them to restart the process until it finds the perfect candidates. The developers have collected the results of the study into the TorchQuantum open source library https://github.com/mit-han-lab/torchquantum.
https://news.mit.edu/2022/quantum-circuits-robust-noise-0321

351 views04:11

Open / Comment

2022-04-01 07:47:23

#test
Gradient boosting is based on

Anonymous Quiz

SVM

logistic regression

77%

ensemble of decision trees

10%

linear regression

86 voters379 views04:47

Open / Comment

2022-03-30 06:27:59 Generation of 3D scenes from 2D photos with NVIDIA's NeRF
Inverse rendering has long used AI to approximate the behavior of light in the real world, allowing a 3D scene to be reconstructed from multiple 2D images taken from different angles. The NVIDIA research team has developed an approach that solves this problem almost instantly by combining ultra-fast neural network training and fast rendering.
NVIDIA has taken this approach to a popular new technology called Neural Radiation Fields, or NeRF. The result, dubbed Instant NeRF, is the fastest NeRF technology to date, achieving over 1000x speedup in some cases. It only takes a few seconds for the model to learn from a few dozen still photos - plus the camera angles they were taken from - and then it can render the resulting 3D scene in tens of milliseconds.
NeRFs use neural networks to represent and render realistic 3D scenes based on an input collection of 2D images. Collecting data for NeRF transmission is reminiscent of the work of a photographer on the red carpet: the neural network needs several dozen images taken from different points of the scene, as well as the position of the camera of each of them.
Typically, creating a 3D scene using traditional methods takes several hours or more, depending on the complexity and resolution of the rendering. Bringing AI into the picture speeds things up. Early NeRF models rendered crisp, artifact-free scenes in minutes, but took hours to learn. Instant NeRF reduces rendering time by several orders of magnitude. It is based on multi-resolution hash mesh encoding that is optimized to run efficiently on NVIDIA GPUs. This way you can achieve high-quality results using a fast and small neural network.
The model was developed using the NVIDIA CUDA toolkit and the Tiny CUDA neural network library. Due to its lightness, the neural network can be trained and run on a single NVIDIA GPU - it runs fastest on cards with NVIDIA Tensor Cores.
This technology will be useful for training robots and self-driving cars so that they can understand the size and shape of objects in the real world by capturing 2D images or video recordings of them. It can also be used in architecture and entertainment to quickly create digital representations of real environments that creators can modify and use.
https://blogs.nvidia.com/blog/2022/03/25/instant-nerf-research-3d-ai/

393 views03:27

Open / Comment

2022-03-28 06:36:14 TOP-15 Data Science conferences in April 2022:
• Apr 5-6, Healthcare NLP Summit (Online training takes place Apr 12-15) https://www.nlpsummit.org/healthcare-2022/
• Apr 6, Google Data Cloud Summit. Virtual. https://cloudonair.withgoogle.com/events/summit-data-cloud-2022
• Apr 13-14, Unite 2022: The Collaborative Intelligence Summit. Atlanta, GA, USA. https://unite2022.com/
• Apr 13, Analytics Summit 2022. Cincinnati, OH, USA. https://web.cvent.com/event/c6511810-01df-4e56-8c98-9c649301e3e4/
• Apr 14-16, WAICF: World AI Cannes Festival. Cannes, France. https://worldaicannes.com/
• Apr 19-21, ODSC East: Open Data Science, Boston, MA, USA. https://odsc.com/boston/
• Apr 20, DSS Virtual: AI & ML in the Enterprise. Virtual. https://www.datascience.salon/virtual-ai-and-ml-enterprise/
• Apr 21-22, RE.WORK AI in Finance Summit. New York, NY, USA https://www.re-work.co/events/ai-in-finance-summit-new-york-2022
• Apr 21-22, RE.WORK AI in Insurance Summit. New York, NY, USA https://www.re-work.co/events/ai-in-insurance-summit-new-york-2022
• Apr 25-27, Data Governance, Quality, and Compliance https://tdwi.org/events/seminars/april/data-governance-quality-compliance/home.aspx
• Apr 25-26, Chief Data & Analytics Officers, APEX East. Fort Myers, FL, USA. https://cdao-apex-east.coriniumintelligence.com/
• Apr 25-29, International Conference on Learning Representations (ICLR) https://www.iclr.cc/Conferences/2022
• Apr 26-27, Insurance AI & Innovative Tech USA 2022. Chicago, IL, USA. https://events.reutersevents.com/insurance/insuranceai-usa
• Apr 27, 4-6PM GMT, Natural Language Generation: Financial services, humans + AI together. London, UK. https://www.meetup.com/london-nlg-meetup-group/events/284525082/
• Apr 27, Computer Vision Summit. San Jose, CA, USA. https://computervisionsummit.com/

653 views03:36

Open / Comment

2022-03-25 09:31:13

#test
The first step in expirement design is

Anonymous Quiz

to calculate p-value

87%

to define hypothesis

to collect datasets for testing

to fullfill the confusion matrix

68 voters425 views06:31

Open / Comment

2022-03-23 08:41:17 Useful ML Services: Everypixel API for Image Recognition
We continue to get acquainted with useful ML tools. Meet the Everypixel API, a simple yet powerful visual recognition method that uses machine learning to understand images.
The API uses a set of pre-trained models that parse images and return useful information. It processes images and then tags them with relevant keywords, which helps in their categorization and moderation. In addition, it evaluates images according to their quality and aesthetic value. Great for online stores and marketplaces to complement product and image data. Allows you to upload images without writing descriptions, as they are filled in automatically. Thanks to the generation of keywords for images, it will help in SEO tasks, and the categorization of images will improve search and directory navigation.
Pros of Everypixel API:
• works even when the end user takes a picture at the wrong angle or in poor lighting conditions;
• sees images the way a person sees them;
• can create keywords associated with images;
• selects the best shot from several similar photos;
• Can rate images from 0 to 100 depending on their quality.
Disadvantages of Everypixel API:
• The free plan is limited to 100 requests per day;
• cannot rate historical photographs, illustrations, or 3D renderings.
https://labs.everypixel.com/api

455 views05:41

Open / Comment

2022-03-21 06:20:40 MLOps basics: 5 formats for transferring ML models
For ML systems, portability between different stages of the life cycle, from development to deployment in production, is important. For example, a Data Scientist writes code in notebooks like Jupyter Notebook or Google Colab. When porting this code to a production environment, it should be converted to a lightweight interchange format, compressed and serialized, that is independent of the development language. These formats are as follows:
• Pickle is a binary version of a Python object for serialization and deserialization of its structure, ie. converting a hierarchy of Python objects to a stream of bytes and vice versa;
• ONNX (Open Neural Network Exchange) is an open source format for ML models that provides a common set of operators and a universal file format for various platforms and tools. The ONNX format describes a computation graph (input, output, and operations) and is self-contained. It is deep learning focused, supported by Microsoft and Facebook, and works great with TensorFlow and PyTorch.
• PMML (Predictive Model Markup Language) is an XML-based predictive model exchange format that allows you to develop a model in one system for one application and deploy it to another using another application by passing an XML configuration file.
• PFA (Portable Format for Analytics) is a standard for statistical models and data transformation engines that is easily portable between different systems and models. Pre-processing and post-processing functions can be chained together and built into complex workflows. A PFA can be a simple raw data transformation or a complex set of parallel data mining models with a JSON or YAML configuration file.
• NNEF (Neural Network Exchange Format) is a format that facilitates the process of deploying machine learning, allowing you to use a set of neural network training tools for applications on various devices and platforms.
There are also framework-specific formats, such as POJO/MOJO for the H2O AutoML platform and Spark MLWritable for Apache Spark.

478 views03:20

Open / Comment

2022-03-18 09:46:11

#test
What is significance level in hypothesis testing?

Anonymous Quiz

44%

p-value

39%

probability of rejecting the null hypothesis given that it is true

11%

probability of rejecting the alternative hypothesis is true

probability of rejecting the alternative hypothesis is false

70 voters469 views06:46

Open / Comment

2022-03-16 10:17:21 ML to protect against DDos attacks
Machine learning algorithms are actively used in cybersecurity, for example, to identify atypical user behavior due to unauthorized access. ML can also be used to protect against DDOS attacks. The goal of a DDoS attack is to disrupt an organization by flooding a network, Internet-connected service, or technical infrastructure surrounding the target with unwanted traffic. The amount of traffic directed to the target can severely limit or disable availability.
DDoS attacks use Internet-connected devices that have already been compromised by malware. An attacker exploits existing vulnerabilities in dozens, hundreds, thousands, or even millions of devices to gain remote control. Thanks to the ubiquity of IoT devices, when even a home refrigerator goes online, protection against DDOS attacks is relevant for both businesses and private households.
A 2017 Kaspersky Lab survey found that the cost of sustaining a DDoS attack for small and medium businesses was $120,000. For large enterprises, this figure has risen to $2 million. And a 2018 study estimated the cost of downtime for a large organization to range from $300,000 to $540,000. In the US, the average global cost of a data breach was $8.46 million, according to a 2020 IBM report.
Using ML, you can build a binary classification model that would mitigate the impact of a DDoS attack on an organization's activities by correctly distinguishing safe traffic from malicious traffic. Here it is necessary to reduce the rate at which the ML model incorrectly identified safe traffic as malicious, as well as mitigate the consequences of a DDoS attack by correctly identifying malicious traffic with a probability of at least 90%.
Implementation example with Dask, XGBoost and Stacked Ensembling:
https://towardsdatascience.com/mitigating-ddos-attacks-with-classification-models-aa75ea813d85

475 views07:17

Open / Comment

2022-03-14 19:14:17 Biopass - REST API of a SaaS product for face recognition
Biopass is a platform for processing biometric data and artificial intelligence for creating ID products. The Biopass ID RESTfull API allows developers to enroll, manage, verify people, match and extract biometric data, manage fingerprint image compression and decompression, face detection, analyze face fakes, anonymize faces, and perform quality checks.
BioPass ID is an online cloud service that provides powerful multi-biometric and artificial intelligence technology for the development of any Internet-enabled service, software or platform. As a SaaS (Biometrics as a Service) product, BioPass ID supports any programming language, sensor model, camera or platform, enabling fast and easy implementation and system integration.
Images are common options in BioPass ID requests. To send them in API requests, you need to encode them into base64 strings. If the string is not in base64 string format, the call will return a bad request response with the message "Invalid JSON format".
https://www.biopassid.com/

404 views16:14

Open / Comment

Big Data Science

Ratings & Reviews

The latest Messages 7

Popular Channels

Related Chats

Popular Channels

Login