Get Mystery Box with random crypto!

Big Data Science

Logo of telegram channel bdscience — Big Data Science B
Logo of telegram channel bdscience — Big Data Science
Channel address: @bdscience
Categories: Technologies
Language: English
Subscribers: 1.44K
Description from channel

Big Data Science channel gathers together all interesting facts about Data Science.
For cooperation: a.chernobrovov@gmail.com
💼 — https://t.me/bds_job — channel about Data Science jobs and career
💻 — https://t.me/bdscience_ru — Big Data Science [RU]

Ratings & Reviews

1.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

0

4 stars

0

3 stars

1

2 stars

0

1 stars

2


The latest Messages

2022-08-31 07:01:36 TOP-15 DS-events in September 2022 all over the world:
1. Sep 7-8
• AI for Defense Summit • Washington, DC, USA https://ai.dsigroup.org/
2. Sep 7-9 • Southern Data Science Conference 2022 • Atlanta, GA, USA https://www.southerndatascience.com/
3. Sep 12-14 • TDWI Data Governance, Quality, and Compliance. • Virtual https://tdwi.org/events/seminars/september/data-governance-quality-compliance/home.aspx
4. Sep 13-14 • Chief Data & Analytics Officers, Brazil • Brazil https://cdao-brazil.coriniumintelligence.com/
5. Sep 13-14 • Edge AI Summit • Santa Clara, CA, USA https://edgeaisummit.com/events/edge-ai-summit
6. Sep 13-15 • AI Hardware Summit • Santa Clara, CA, USA https://www.aihardwaresummit.com/events/aihardwaresummit
7. Sep 14-15 • Deep Learning Summit • London, UK https://www.re-work.co/events/deep-learning-summit-london-2022
8. Sep 14-15 • AI in Retail Summit • London, UK https://www.re-work.co/events/ai-in-retail-summit-london-2022
9. Sep 14-15 • Conversational AI Summit • London, UK https://www.re-work.co/events/conversational-ai-summit-london-2022
10. Sep 15-16 • The Modern Data Stack Conference • San Francisco, CA, USA https://www.moderndatastackconference.com/
11. Sep 21-22 • Big Data LDN • London, UK https://bigdataldn.com/
12. Sep 22 • EM Biotech Connect 2022 • Boston, MA, USA https://elementalmachines.com/em-biotech-connect-2022-0
13. Sep 22 • data.world Summit • Virtual https://data.world/events/summit/
14. Sep 26-30 • SIAM Conference on Mathematics of Data Science (MDS22) • San Diego, CA, USA https://www.siam.org/conferences/cm/conference/mds22
15. Sep 29 • Data2030 Summit 2022 • Stockholm, Sweden + Virtual https://data2030summit.com
162 views04:01
Open / Comment
2022-08-30 07:36:26 Need to log Python application events? There is a special module!
Python library logging (https://docs.python.org/3/library/logging.html) defines functions and classes that implement a flexible event logging system for applications and the library. The main advantage of the logging API, an extension of this standard library, is the ability to log all events. Therefore, the Python application log can display native messages inline with messages from external modules.
The module consists of the following classes:
• Registrars require an interface that uses application code.
• Handlers send log entries (created by registrars) to the appropriate destination.
• Filters require more precise definition of the log entries to display.
• Formats for determining the location of entries in the final output.
The level of the log indicates its severity, i.e. How important is a separate message. At the basic logging level, DEBUG has the lowest priority, and CRITICAL has the highest. If we define a logger of message-sensitive logs with the DEBUG level, then all of our logged messages will be logged, since DEBUG is the lowest level. You can configure checking only for events with the ERROR and CRITICAL types.
Сode example: https://medium.com/@DavidElvis/logging-for-ml-systems-1b055005c2c2
143 views04:36
Open / Comment
2022-08-26 08:36:14
#test
The probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true is called
Anonymous Quiz
61%
power of a binary hypothesis test
38%
type II error
1%
confusion matrix
0%
random value
71 voters287 views05:36
Open / Comment
2022-08-24 09:21:36 Python-library to calendar operations
Python includes a built-in calendar module that includes an operation related to dates and days of the week. The functions and classes use the European calendar module, where Monday is the first day of the week and Sunday is Sunday.
To use this feature, you must first import it into your code:
import calendar
You can then call a function, for example print the names of the months in a list:
month_names = list(calendar.month_name[1:])
print(month_names)
https://docs.python.org/3/library/calendar.html
341 views06:21
Open / Comment
2022-08-22 09:28:03 Instead of loops: 3 Python life hacks
Developers and data scientists know that loops in Python are slow. Instead, you can use possible alternatives:
• Map - apply a function to each value of an iterable object(list, tuple, etc. ).
• Filter - to filter out values from an iterable object (list, tuple, sets, etc.). The filtering conditions are set inside a function which is passed as an argument to the filter function.
• Reduce – this function is a bit different from the map and filter functions. It is applied iteratively to all the values of the iterable object and returns only one value.
Examples: https://medium.com/codex/3-most-efficient-yet-underutilized-functions-in-python-d865ffaca0bb
423 views06:28
Open / Comment
2022-08-19 07:58:25 Instead of Jupyter Notebook: Benefits of Deepnote
Jupyter notebooks have been actively using data analytics for many years and contain ML. However, despite its popularity, this research tool has significant implications:
• Difficulty in code versioning. Browsing Jupyter notebooks comes in the form of large JSON files, merging two notebooks is next to impossible. As is the usual use by developers of a Git-like tool version.
• Lack of high efficiency with IDE, code highlighting and tooltips. Usually a Data Scientist is not a professional software developer, and therefore tools that regulate the quality of the code and the value of its increase are very important.
• Difficulties of development through testing. The popular test-driven development methodology (test-driven development) is almost impossible to implement in Jupyter notebooks. Therefore, they cannot be used in data pipelines.
• Non-linear workflow due to transition from one cell to another. This may seem like an irreproducible experiment. The interactive way to encode and navigate between cells is both one of Jupyter Notebook's best features and its biggest weakness.
• Jupyter is not well suited for running long running asynchronous data scoping tasks.

Many of the shortcomings are addressed as an alternative to Jupyter Notebook called Deepnote. Deepnote, like Jupyter, is an interactive notebook for solving a DS problem, but it outperforms its competitor in a number of advantages :
• Real-time Collaboration - Created by Google Docs, you can share links to your notebook with colleagues, giving everyone the desired level of access (view, execute, comment, edit and full control). In addition, a cell in Deepnote allows a collaborator to leave comments, eliminating the need to switch between applications for posting messages and code for providing feedback. With access to the developer's code, managers and other board members easily follow the code development progress and development life cycle.
• Easy environment management deployment - Deepnote takes the job of installing modules and setting up the environment to run Python, including versions. In addition to Python, Deepnote also supports SQL queries.
• Deepnote has the ability to embed blocks of code in blogs and other repositories by implementing the creation of a GitHub project specifically for this purpose. Deepnote Cells allows you to embed code only, embed output only, and embed both code and output.
• Data Visualisation - Jupyter notebooks almost never evaluate EDA execution without explicit coding. Deepnote provides an early development tool in the notebook itself - the early block allows you to generate information, just like building Python, but without the need to write code.
• Save time and money - once Deepnote is in charge of code management and processing, teams don't need to commit their code pipelines to tools like GitHub, BitBucket, etc. thus reducing operating costs.
Try it for free: https://deepnote.com/
610 viewsedited  04:58
Open / Comment
2022-08-17 08:15:11 10 Best Practices for Naming Tables and Fields in a Database
If every developer and analyst followed these simple rules, reverse engineering would become a hobby, not a laborious job. To make it easier for you and your colleagues to work with the database, try these simple rules:
1. Separate words with underscores if the name attribute or database table consists of 2 or more words. This is a more stylish camelCase case, improved readability and conceptual platform dependency. For example, word_count.
2. Write full and semantically based names for tables and columns without reference to data types. Saving a couple of characters will do nothing but confuse. It is permissible to use benefits only where this is an abbreviated name for everyone.
3. Write the attribute name with a lowercase letter to avoid confusion from upper-case SQL keywords. It will also improve your typing speed.
4. Do not use numbers in the names of tables and columns.
5. Name the tables clearly, but briefly.
6. Name tables and columns in the singular. For example, author instead of authors.
7. Name the linking tables in alphabetical order. For example author_book
8. When an index is set, add its table and column name. For example, CREATE INDEX person_ix_first_name_last_name ON person (first_name, last_name);
9. For Boolean column type add prefix name with is_ or has_ . For example is_admin or has_membership.
10. For columns of Date-Time type, add suffix _at or _time to the name. For example, order_at or order_time.
https://dev.to/mohammadfaisal/how-to-design-a-clean-database-1e83
358 views05:15
Open / Comment
2022-08-15 06:55:40
3 types of data anomalies
371 views03:55
Open / Comment
2022-08-15 06:55:15 3 types of data anomalies
Data analysts and machine learning professionals often detect anomalies in data – detection that is not detected to pattern detection and detection. There are 3 types of anomalies:
• point anomaly, when one data point (observation) in the data set is far from the sources of the others and represents an extreme, unevenness or deviation that occurs randomly and is not related to the overall load in the data. The point anomaly also eliminates the global outlier because it is significantly different from the rest of the dataset.
• context anomaly, when a particular instance is revealed to be observable from the context. For example, in the case of time series data, such as recording a certain amount over time, the context is temporal. Data points that are very different from other data in the same sense are due to contextual outliers. For example, when the number of cars passing through the checkpoint on the border of the region in March, on average, is 1 thousand over the past 20 years. And in June, when the vacation period begins, this number reaches 8 thousand. If at the beginning of March it is 9 thousand, it will be considered an anomaly, and in the summer it will not be an anomaly. It's common for retail to see shoppers pop up during the holiday season. But a sharp increase in sales outside of holidays or sales can be called a contextual outlier.
• Collective anomaly, where a group of correlated, interrelated, or sequential instances is significantly different from the rest of the data, ie these data points are judged to be anomalous. For time series data, this may look like typical peaks and troughs occurring outside of the time period when the seasonal sequence is normal, or as a set of time series that are in outlier conditions. For example, at the same time, along with a large number of companies, there is a drop in sales, although before that there was an upward trend.
https://medium.com/datadailyread/types-of-data-anomalies-2f6fb1747eb1
366 views03:55
Open / Comment
2022-08-13 05:55:59
#test
The Student's t-test is applied on data follow
Anonymous Quiz
64%
Gaussian distribution
4%
Bayesian distribution
5%
Pareto distribution
28%
any distribution
109 voters396 views02:55
Open / Comment