🔥 Burn Fat Fast. Discover How! 💪

Data Science by ODS.ai 🦜

Logo of telegram channel opendatascience — Data Science by ODS.ai 🦜 D
Logo of telegram channel opendatascience — Data Science by ODS.ai 🦜
Channel address: @opendatascience
Categories: Technologies
Language: English
Subscribers: 51.91K
Description from channel

First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp

Ratings & Reviews

2.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

1

4 stars

0

3 stars

0

2 stars

1

1 stars

1


The latest Messages 4

2023-05-11 20:44:32
StarCoder: may the source be with you!

The BigCode community, an open-scientific collaboration working on the responsible development of Code LLMs, introduces StarCoder and StarCoderBase:
- 15.5B parameter models
- 8K context length
- StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process
- StarCoderBase is fine-tuned on 35B Python tokens, resulting in the creation of StarCoder

StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model.
4.3K views17:44
Open / Comment
2023-05-11 17:57:05 For those how are looking beyond Data Science or wondering to play around, here is a news on the release of the portfolio company of one of the channel editors:

TON Play: the Unity SDK + payment management for games

TON Play is a toolkit for developers based on the TON blockchain and working closely with the messaging app Telegram. They recently introduced Pay-in, Mass payout, and On-demand payout methods in TON. If you dabble with games, this might be curious to test in action.

The main features:

* projects get paid by Telegram users in TON
* option to add mass payouts in TON to games with cash prizes
* automated payouts on user demand

TON Play also released SDKs, allowing projects to manage assets and in-game marketplace and port Unity or HTML5 games to work inside Telegram as a web app. SDKs are written in Unity, Python, and Typescript.

Website: https://tonplay.io/
Documentation: https://docs.tonplay.io/
Telegram channel: https://t.me/tonplayinsider
Contacts: @tonplay_devs, gamedevs@tonplay.io


#ds_jobs #ds_resumes
4.3K views14:57
Open / Comment
2023-05-10 21:13:03
2.9K views18:13
Open / Comment
2023-05-10 15:53:14
Found another PyTorch-based library with basic image functions, losses and transformations

Looks like it is a combination toolkit of augs, skimage and classic cv2 functions, but written in PyTorch.

What is Kornia? Kornia is a differentiable library that allows classical computer vision to be integrated into deep learning models.

Examples:

- https://kornia.readthedocs.io/en/latest/get-started/highlights.html
- and especially this https://kornia.readthedocs.io/en/latest/losses.html
3.5K views12:53
Open / Comment
2023-05-10 07:11:27 ​​ImageBind: One Embedding Space To Bind Them All

Introducing ImageBind, a groundbreaking approach that learns a joint embedding across six different modalities – images, text, audio, depth, thermal, and IMU data – using only image-paired data. This innovative method leverages recent large-scale vision-language models, extending their zero-shot capabilities to new modalities through their natural pairing with images. ImageBind unlocks a myriad of novel emergent applications 'out-of-the-box,' including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection, and generation.

ImageBind's emergent capabilities improve as the strength of the image encoder increases, setting a new state-of-the-art benchmark in emergent zero-shot recognition tasks across modalities, even outperforming specialist supervised models. Furthermore, ImageBind demonstrates impressive few-shot recognition results, surpassing prior work in the field. This pioneering technique offers a fresh way to evaluate vision models for both visual and non-visual tasks, opening the door to exciting advancements in AI and machine learning.

Blogpost link: https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/

Code link: https://github.com/facebookresearch/ImageBind

Paper link: https://dl.fbaipublicfiles.com/imagebind/imagebind_final.pdf

Demo link: https://imagebind.metademolab.com/

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-imagebind

#deeplearning #nlp #multimodal #cv #embedding
3.9K views04:11
Open / Comment
2023-05-08 07:22:39 ​​Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Researchers have developed "Distilling step-by-step," a cutting-edge method to train smaller, more efficient task-specific models that outperform large language models (LLMs) while requiring significantly less training data. This innovation promises to revolutionize the practicality of NLP models in real-world applications by reducing both model size and data requirements: a 770M T5 model surpasses a 540B PaLM model using only 80% of available data.

Distilling step-by-step leverages LLM-generated rationales within a multi-task training framework, yielding impressive results across 4 NLP benchmarks. The technique consistently achieves better performance with fewer labeled/unlabeled training examples, surpassing LLMs with substantially smaller model sizes.

Paper link: https://arxiv.org/abs/2305.02301

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-dsbs

#deeplearning #nlp #languagemodels #distillation
4.8K views04:22
Open / Comment
2023-05-07 00:23:06 TWIMC

string2string: A Modern Python Library for String-to-String Algorithms
https://arxiv.org/abs/2304.14395

We introduce string2string, an open-source library that offers a comprehensive suite of efficient algorithms for a broad range of string-to-string problems. It includes traditional algorithmic solutions as well as recent advanced neural approaches to tackle various problems in string alignment, distance measurement, lexical and semantic search, and similarity analysis -- along with several helpful visualization tools and metrics to facilitate the interpretation and analysis of these methods. Notable algorithms featured in the library include the Smith-Waterman algorithm for pairwise local alignment, the Hirschberg algorithm for global alignment, the Wagner-Fisher algorithm for edit distance, BARTScore and BERTScore for similarity analysis, the Knuth-Morris-Pratt algorithm for lexical search, and Faiss for semantic search. Besides, it wraps existing efficient and widely-used implementations of certain frameworks and metrics, such as sacreBLEU and ROUGE, whenever it is appropriate and suitable. Overall, the library aims to provide extensive coverage and increased flexibility in comparison to existing libraries for strings. It can be used for many downstream applications, tasks, and problems in natural-language processing, bioinformatics, and computational social sciences. It is implemented in Python, easily installable via pip, and accessible through a simple API. Source code, documentation, and tutorials are all available on our GitHub page:

https://github.com/stanfordnlp/string2string
5.2K views21:23
Open / Comment
2023-05-06 12:05:06 Last call to apply for the Yandex School of Data Analysis.

Recruitment for the YSDA, which is a vocational training program, free of charge, lasting for two years, will end on the 06 May 2023.

You can choose one of the four highly demanded majors: data science, big data infrastructure, machine learning or data analysis in applied sciences.

To be able to pass examinations and study successfully at the YSDA one should have a basic understanding of the machine learning, have a good mathematical background and use one of the programming languages. Experienced developers can apply for an alternative admission track that includes both assessment of algorithms basics and mathematics and research and/or industrial achievements.

The educational process is mainly conducted in the Russian language.

Application form is accessible via link https://clck.ru/34GwCS, there is also a tg-chat for the applicants https://t.me/+DQ1j7epbIlNmNjFi
6.0K views09:05
Open / Comment
2023-05-04 07:37:59 ​​Phoenix: Democratizing ChatGPT across Languages

Introducing "Phoenix," a revolutionary multilingual ChatGPT that's breaking barriers in AI language models! By excelling in languages with limited resources and demonstrating competitive performance in English and Chinese models, Phoenix is set to transform accessibility for people around the world.

The methodology behind Phoenix combines instructions and conversations data to create a more well-rounded language model, leveraging the multi-lingual nature of the data to understand and interact with diverse languages.

Paper link: https://arxiv.org/abs/2304.10453

Code link: https://github.com/FreedomIntelligence/LLMZoo

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-phoenix-llm

#deeplearning #nlp #Phoenix #ChatGPT #multilingual #languagemodel
7.1K views04:37
Open / Comment
2023-05-01 07:56:16 ​​Scaling Transformer to 1M tokens and beyond with RMT

Imagine extending the context length of BERT, one of the most effective Transformer-based models in natural language processing, to an unprecedented two million tokens! This technical report unveils the Recurrent Memory Transformer (RMT) architecture, which achieves this incredible feat while maintaining high memory retrieval accuracy.

The RMT approach enables storage and processing of both local and global information, allowing information flow between segments of the input sequence through recurrence. The experiments showcase the effectiveness of this groundbreaking method, with immense potential to enhance long-term dependency handling in natural language understanding and generation tasks, as well as enable large-scale context processing for memory-intensive applications.

Paper link: https://arxiv.org/abs/2304.11062
Code link: https://github.com/booydar/t5-experiments/tree/scaling-report

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-rmt-1m

#deeplearning #nlp #bert #memory
8.5K views04:56
Open / Comment