Big Data Science

Channel address:

Categories: Technologies

Language: English

Subscribers: 1.44K

Description from channel

Big Data Science channel gathers together all interesting facts about Data Science.
For cooperation: a.chernobrovov@gmail.com
💼 — https://t.me/bds_job — channel about Data Science jobs and career
💻 — https://t.me/bdscience_ru — Big Data Science [RU]

▲ Vote (1)

Ratings & Reviews

1.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 16

2021-09-27 08:22:21 3 Things You Didn't Know About Python Memory Usage
Since Python is the main programming language for Data Science tasks, every DS specialist will benefit from the following features of this tool:
• Obtaining the address of an object in memory - for this, Python uses the id () function, which returns the memory address for a variable as an integer.
• Garbage collection - Python uses reference counting to decide when an object should be removed from memory. Python calculates the number of references to each object, and when the object is unreferenced, the Garbage Collection runs. For complex cases, you can manually change the garbage collection behavior using the gc module in Python.
• Interning or integer caching - To save time and memory costs, Python always preloads all integers in the range [-5, 256]. When a new integer variable is declared in this range, Python simply references the cached integer and does not create any new objects. Therefore, no matter how many variables were created, if they refer to an integer 256 in the range [-5, 256], they will all point to the same memory address of the cached integer. Likewise, Python has interning mechanisms for short strings.
https://medium.com/techtofreedom/memory-management-in-python-3-popular-interview-questions-bce4bc69b69a

304 views05:22

Open / Comment

2021-09-26 14:01:03 TOP-10 the most interesting DS-conferences all over the world in October 2021
1. 5-8 Oct - NLP Summit, Applied Natural Language Processing, online. https://www.nlpsummit.org/nlp-2021/
2. 6-7 Oct - TransformX AI Conference, with 100+ speakers including Andrew Ng, Fei-fei Li, free and open to the public. Online. https://www.aicamp.ai/event/eventdetails/W2021100608
3. 6-9 Oct -The 8th IEEE International Conference on Data Science and Advanced Analytics, Porto, Portugal https://dsaa2021.dcc.fc.up.pt/
4. 12-14 Oct - Google Cloud Next '21, a global digital experience. Online. https://cloud.withgoogle.com/next
5. 12-14 Oct - Chief Data & Analytics Officers (CDAO). Online. https://cdao-fall.coriniumintelligence.com/virtual-home
6. 13-14 Oct - Big Data and AI Toronto. Online. https://www.bigdata-toronto.com/register
7. 15 – 17 Oct - DataScienceGO, UCLA Campus, Los Angeles, USA https://www.datasciencego.com/united-states
8. 19 Oct - Graph + AI Summit Fall 2021 - open conference for accelerating analytics and AI with Graph. New York, NY, USA and virtual https://info.tigergraph.com/graphai-fall
9. 20 – 21 Oct - RE.WORK Conversational AI for Enterprise Summit, Online. https://www.re-work.co/summits/conversational-ai-enterprise-summit
10. 21 Oct - DSS Mini Salon: The Future of Retail with Behavioral Data, Online. https://www.datascience.salon/snowplow-analytics-mini-virtual-salon

304 views11:01

Open / Comment

2021-09-24 18:18:25 Register for the free international online conference DataArt IT NonStop 2021!

IT NonStop will be held on November 18-20, 2021.
This year, we will be focusing on Cloud, Data, and Machine Learning & Artificial Intelligence. Market leaders will take the stage and share their own knowledge, case studies and best solutions. The main working language of the conference is English, however there will be a special Junior track on November 20 that will be delivered mostly in Russian. November 20 will be also dedicated to workshops.

More than 30 speakers from Microsoft, AWS, Ocado, Codete, Ciklum, Eleks, SoftServe, Toloka, Yandex, DataArt, and other market leaders will take stage at IT NonStop 2021. We can't list all of them in one post, so here are the selected few workshops:
— "Creating Real-Time Data Streaming powered by SQL on Kubernetes", Albert Lewandowski, Big Data DevOps Engineer, GetInData.
— "Create your own cognitive portrait in 60 minutes", Dmitry Soshnikov, Cloud Developer Advocate, Microsoft.
— "Training unbiased and accurate AI models", Robert Yenokyan, AI Lead, Pinsight.
The whole list of speakers and topics is available on our webpage and it's constantly growing.
You can still sign up for our conference. Registration is open and it's free for everyone!

Briefly about the IT NonStop Conference:
When: November 18-20
Venue: online and free of charge
Registration: https://it-nonstop.net/register-to-the-conference/?utm_source=bdscience&utm_medium=referral

267 viewsedited 15:18

Open / Comment

2021-09-24 08:02:20 Need sentiment analytics in YouTube comments?
Over 2 billion users watch YouTube videos at least once a month. Popular YouTube bloggers have billions of views. But you can't please all subscribers and public opinion is constantly changing. Build your user sentiment analysis model with Youtube-Comment-Scraper, a Python library for getting comments on YouTube videos using browser automation (only works on Windows for now). This open-source project will help create a dashboard that analyzes the attitude of subscribers to videos of popular youtubers. The work will be reduced to the following steps:
• collecting the necessary comments to the video from YouTube users;
• using a pretrained ML model to make predictions for each comment;
• visualization of model forecasts on a dashboard, incl. using Dash in Python or Shiny in R.
Add interactivity with filters to sentiment analysis results by release time, video author, and genre.
https://pypi.org/project/youtube-comment-scraper-python/

292 views05:02

Open / Comment

2021-09-22 06:55:31 How to evaluate the quality of a multi-object ML model of computer vision?
Tracking multiple objects in a real-world environment is challenging, incl. due to the metrics for evaluating the quality of the ML-model, the purpose of which is to evaluate the tracking accuracy and check the trajectory of a moving object. Suppose, for each frame in the video stream, the tracking system infers the hypothesis 'n', and there are 'm' main true objects in the frame. Then the process of evaluating indicators is as follows:
• Find and match the best match between hypothesis and underlying truth based on their coordinates and using various matching algorithms.
• For each matched pair, find the error in the position of the object.
• Calculate the sum of several errors, such as misses (the tracker was unable to hypothesize for an object), false positives (when the tracker generated a hypothesis, but the object was absent) and mismatch errors (when the hypothesis of the watcher of valid information changed the current frame).
So the performance of the ML-model can be expressed in two metrics:
• MOTP (Multi-Object Tracking Precision) shows how accurately the precise positions of an object are estimated. This is the total error in estimating the location for the overlapping ground truth-hypothesis pairs across all frames, averaged over the total number of matches made. This metric is not responsible for recognizing object configurations and evaluating object trajectories. The metric ranges from 0 to 1. If the MOTP value is 1, then the system's accuracy is poor. And if it is close to zero, then the accuracy of the system is good.
• MOTA (Multi-Object Tracking Accuracy) shows how many errors the tracking system made (misses, false positives, mismatch errors). The metric ranges from –inf to 1. If the MOTA is 1, then the accuracy of the system is good. If the MOTA is near zero or less than zero, then the accuracy of the system is poor.
https://pub.towardsai.net/multi-object-tracking-metrics-1e602f364c0c

150 views03:55

Open / Comment

2021-09-20 06:27:30 3 useful Python libraries for Data Scientist
• JMESPath - a library that helps you query for JSON. Useful when working with a large multi-level JSON document or dictionary. JMESPath exposes the object to JavaScript-style access, making it easier to develop and test your code. It's also safe - if any of the paths don't exist, the JMESPath lookup function will return None. https://github.com/jmespath/jmespath.py
• Inflection is a Ruby-derived library that helps you handle complex string processing logic. It translates English words to singular and plural, and also converts strings from CamelCase to underscore. Useful when there are variable or data point names generated in another language or on another system that need to be converted to pythonic style in accordance with the PEP standards. https://github.com/jpvanhal/inflection
• more-itertools - a library that includes a set of useful functions that can be used in various development tasks. For example, write code quickly and gracefully to split one dictionary into multiple lists based on a common repeating key, or to loop through multiple lists. This library will automatically organize your regex implementation and set up recursive constraints. https://github.com/more-itertools/more-itertools

207 views03:27

Open / Comment

2021-09-17 06:07:19 4 best practices to improve efficiency from using the Google Cloud Translation API
A web service that dynamically translates between languages using Google ML models supports over 100 languages and is actively used in practice. And if you know useful life hacks, you can reduce costs, increase productivity, and improve the security of this translation API on websites.
1. Caching translated content not only reduces the number of calls to the Google Cloud Translation API, but also reduces the load and computation usage on internal web servers and databases. This optimizes application performance and reduces shipping costs. You can configure caching in an application architecture at different levels of the application. For example, at the proxy level (NGINX or HAProxy), the application itself in memory on web servers, or an external memory caching service, as well as through a CDN.
2. Secure access based on the principle of least privilege. When accessing the Google Cloud Translation API, it is recommended that you use a Google Cloud Service account rather than api keys. A service account is a special type of authentication that represents a non-human user and can be authorized to access data in the Google API. Service accounts are not assigned passwords and cannot be used to log in through a browser, minimizing this threat vector. By following the principle of least privilege, you can grant a least privileged role with a set of permissions to access the translation API.
3. Setting up translations. If your content includes domain and context terms, Google Cloud Translation API Advanced supports custom terminology through a glossary. You can create and use your own translation models using Google AutoML Translation. Customers understand the potential risks of errors and inaccuracies by alerting users that content has been automatically translated by Google.
4. Budget control. The costs associated with the Google Cloud Translation API mainly depend on the number of characters sent to the API. For example, at $ 10 per million characters, if a web page contains 20 million characters and needs to be translated into 10 languages, the cost would be $ 10 * 20 = $ 200. Setting up alerts in your work environment will help you keep track of your budget.
https://cloud.google.com/blog/products/ai-machine-learning/four-best-practices-for-translating-your-website

151 views03:07

Open / Comment

2021-09-15 13:12:52 Data Science в городе: продолжаем серию митапов Ситимобила про Data Science в геосервисах, логистике, приложениях Smart City и т.д. Приглашаем на 2-ю онлайн-встречу 23 сентября в 18:00 МСК. Вас ждут интересные доклады DS-практиков из Ситимобила, Optimate AI и Яндекс.Маршрутизации:
Максим Шаланкин (Data Scientist в гео-сервисе Ситимобил) расскажет о жизненном цикле ML-модели прогнозирования времени в пути с учетом большой нагрузки
Сергей Свиридов (CTO из Optimate AI) объяснит, что не так с классическими эвристиками и методами комбинаторной оптимизации для построения оптимальных маршрутов, и как их можно заменить динамическим программированием
Даниил Тарарухин (Руководитель группы аналитики в Яндекс.Маршрутизации) поделится, как автомобильные пробки влияют на поиск оптимального маршрута и имитационное моделирование этой задачи.
После докладов спикеры ответят на вопросы слушателей.
Ведущий мероприятия – Алексей Чернобровов
Регистрация для бесплатного участия: https://citymobil.timepad.ru/event/1773649/

343 viewsedited 10:12

Open / Comment

2021-09-15 10:35:08 Need to develop an app for real-time emotion recognition on video?
Use Face Recognition API! Open-source project for face recognition and control from Python or command line. The ML model was created using the DL face recognition algorithm and has an accuracy of 99.38% in the Labeled Faces in the Wild test.
With Face Recognition API, application development consists of 5 steps:
• receiving video in real time
• applying Python-functions from a ready-to-use API for detecting faces and emotions on objects in a video stream;
• classification of emotions into categories;
• developing a recommendation system;
• building the application and deploying to Heroku, Dash or a web server.
https://github.com/ageitgey/face_recognition

206 views07:35

Open / Comment

2021-09-13 05:32:50 Web scraping automation: 3 popular tools
Do you want to track prices in an online store or automate ordering food in a restaurant? Try the following remedies:
• Selenium is a well-known test automation framework that can be used to simulate user behavior and perform actions on websites such as filling out forms, clicking buttons, etc. https://selenium-python.readthedocs.io/
• Beautiful Soup is a Python-package for parsing HTML and XML documents. Creates a parse tree that can be used to extract data when parsing web pages. Very good for simple projects. https://pypi.org/project/beautifulsoup4/
• Scrapy is a fast, high-level website crawling and crawling framework used to extract structured data for mining, monitoring, and automated testing. It is great for complex projects and is much faster than the aforementioned counterparts. https://docs.scrapy.org/en/latest/

157 views02:32

Open / Comment

Big Data Science

Ratings & Reviews

The latest Messages 16

Popular Channels

Related Chats

Popular Channels

Login