Get Mystery Box with random crypto!

Data Scientology

Logo of telegram channel datascientology — Data Scientology D
Logo of telegram channel datascientology — Data Scientology
Channel address: @datascientology
Categories: Uncategorized
Language: English
Subscribers: 1.26K
Description from channel

Hot data science related posts every hour. Chat: https://telegram.me/r_channels
Contacts: @lgyanf

Ratings & Reviews

1.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

0

4 stars

0

3 stars

0

2 stars

2

1 stars

1


The latest Messages 7

2022-06-03 01:14:44
[OC] Gun Death Rates by State

/r/dataisbeautiful
https://redd.it/v3cwh6
54 views22:14
Open / Comment
2022-06-02 23:14:53
[OC] Web browsers over the last 28 years

/r/dataisbeautiful
https://redd.it/v3c9nv
58 views20:14
Open / Comment
2022-06-02 21:14:46 Looking for US max high/low temperatures by zip code

Trying to identify zip codes that had temperatures above X in the last 12 months. I’m seeing datasets where I can download historical data for individual zip codes, but having trouble finding a data set that allows me to identify zip codes that have been known to rise above certain thresholds. Thanks for your help.

/r/datasets
https://redd.it/uyho41
62 views18:14
Open / Comment
2022-06-02 20:14:42 Project BFLOAT16 on ALL hardware (>= 2009), up to 2000x faster ML algos, 50% less RAM usage for all old/new hardware - Hyperlearn Reborn.

Hello everyone!! It's been a while!! Years back I released Hyperlearn https://github.com/danielhanchen/hyperlearn. It has 1.2K Github stars, where I made tonnes of algos faster.

I was a bit busy back at NVIDIA and my startup, and I've been casually developing some algos. The question is are people still interested in fast algorithms? Does anyone want to collaborate on reviving Hyperlearn? (Or making a NEW package?) Note the current package is ahhh A MESSS... I'm fixing it - sit tight!!

NEW algos for release:

1. PCA with 50% less memory usage with ZERO data corruption!! (Maths tricks :)) (ie no need to do X - X.mean()!!!)) How you may ask???!
2. Randomized PCA with 50% less memory usage (ie no need to do X - X.mean()).
3. Linear Regression is EVEN faster with now Pivoted Cholesky making algo 100% stable. No package on the internet to my knowledge has pivoted cholesky solvers.
4. Bfloat16 on ALL hardware all the way down to SSE4!!! (Intel Core i7 2009!!)
5. Matrix multiplication with Bfloat16 on ALL hardware/?ASD@! Not the cheap 2x extra memory copying trick - true 0 extra RAM usage on the fly CPU conversion.
6. New Paratrooper Optimizer which trains neural nets 50% faster using the latest fast algos.
7. Sparse blocked matrix multiplication on ALL hardware (NNs) !!
8. Super fast Neural Net training with batched multiprocessing (ie when NN is doing backprop on batch 1, we load batch 2 already etc).
9. Super fast softmax making attention softmax(Q @ K.T / sqrt(d))V super fast and all operations use the fastest possible matrix multiplciation config (tall skinny, square matrices)
10. AND MORE!!!

Old algos made faster:

1. 70% less time to fit Least Squares / Linear Regression than sklearn + 50% less memory usage
2. 50% less time to fit Non Negative Matrix Factorization than sklearn due to new parallelized algo
3. 40% faster full Euclidean / Cosine distance algorithms
4. 50% less time LSMR iterative least squares
5. 50% faster Sparse Matrix operations - parallelized
6. RandomizedSVD is now 20 - 30% faster

Also you might remember my 50 page machine learning book: https://drive.google.com/file/d/18fxyBiPE0G4e5yixAj5S--YL\_pgTh3Vo/view?usp=sharing

https://preview.redd.it/vmmiocvvk7391.png?width=1793&format=png&auto=webp&s=d2c26b4f2fbbfcd007b44d528579a271ea7960cc

/r/MachineLearning
https://redd.it/v38pwm
58 views17:14
Open / Comment
2022-06-02 19:14:52 "authors":[
"Jakob Svensson"
]
}, ... other publications
{
"title":"Coffee In Chhattisgarh", # last publication
"link":"https://www.researchgate.netpublication/353118247_COFFEE_IN_CHHATTISGARH?_sg=CsJ66DoWjFfkMNdujuE-R9aVTZA4kVb_9lGiy1IrYXls1Nur4XFMdh2s5E9zkF5Skb5ZZzh663USfBA",
"source_link":"https://www.researchgate.netNone",
"publication_type":"Technical Report",
"publication_date":"Jul 2021",
"publication_doi":null,
"publication_isbn":null,
"authors":[
"Krishan Pal Singh",
"Beena Nair Singh",
"Dushyant Singh Thakur",
"Anurag Kerketta",
"Shailendra Kumar Sahu"
]
}
]
```

A step-by-step explanation at SerpApi: https://serpapi.com/blog/web-scraping-all-researchgate-publications-in-python/#code-explanation

/r/datasets
https://redd.it/v2bx0w
54 views16:14
Open / Comment
2022-06-02 19:14:49 [Script] Scraping ResearchGate all Publications

```python
from parsel import Selector
from playwright.sync_api import sync_playwright
import json


def scrape_researchgate_publications(query: str):
with sync_playwright() as p:

browser = p.chromium.launch(headless=True, slow_mo=50)
page = browser.new_page(user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36")

publications = []
page_num = 1

while True:
page.goto(f"https://www.researchgate.net/search/publication?q={query}&page={page_num}")
selector = Selector(text=page.content())

for publication in selector.css(".nova-legacy-c-card__body--spacing-inherit"):
title = publication.css(".nova-legacy-v-publication-item__title .nova-legacy-e-link--theme-bare::text").get().title()
title_link = f'https://www.researchgate.net{publication.css(".nova-legacy-v-publication-item__title .nova-legacy-e-link--theme-bare::attr(href)").get()}'
publication_type = publication.css(".nova-legacy-v-publication-item__badge::text").get()
publication_date = publication.css(".nova-legacy-v-publication-item__meta-data-item:nth-child(1) span::text").get()
publication_doi = publication.css(".nova-legacy-v-publication-item__meta-data-item:nth-child(2) span").xpath("normalize-space()").get()
publication_isbn = publication.css(".nova-legacy-v-publication-item__meta-data-item:nth-child(3) span").xpath("normalize-space()").get()
authors = publication.css(".nova-legacy-v-person-inline-item__fullname::text").getall()
source_link = f'https://www.researchgate.net{publication.css(".nova-legacy-v-publication-item__preview-source .nova-legacy-e-link--theme-bare::attr(href)").get()}'

publications.append({
"title": title,
"link": title_link,
"source_link": source_link,
"publication_type": publication_type,
"publication_date": publication_date,
"publication_doi": publication_doi,
"publication_isbn": publication_isbn,
"authors": authors
})

print(f"page number: {page_num}")

# checks if next page arrow key is greyed out `attr(rel)` (inactive) and breaks out of the loop
if selector.css(".nova-legacy-c-button-group__item:nth-child(9) a::attr(rel)").get():
break
else:
page_num += 1


print(json.dumps(publications, indent=2, ensure_ascii=False))

browser.close()


scrape_researchgate_publications(query="coffee")
```


Outputs:

```json
[
{
"title":"The Social Life Of Coffee Turkey’S Local Coffees",
"link":"https://www.researchgate.netpublication/360540595_The_Social_Life_of_Coffee_Turkey%27s_Local_Coffees?_sg=kzuAi6HlFbSbnLEwtGr3BA_eiFtDIe1VEA4uvJlkBHOcbSjh5XlSQe6GpYvrbi12M0Z2MQ6grwnq9fI",
"source_link":"https://www.researchgate.netpublication/360540595_The_Social_Life_of_Coffee_Turkey%27s_Local_Coffees?_sg=kzuAi6HlFbSbnLEwtGr3BA_eiFtDIe1VEA4uvJlkBHOcbSjh5XlSQe6GpYvrbi12M0Z2MQ6grwnq9fI",
"publication_type":"Conference Paper",
"publication_date":"Apr 2022",
"publication_doi":null,
"publication_isbn":null,
"authors":[
"Gülşen Berat Torusdağ",
"Merve Uçkan Çakır",
"Cinucen Okat"
]
},
{
"title":"Coffee With The Algorithm",
"link":"https://www.researchgate.netpublication/359599064_Coffee_with_the_Algorithm?_sg=3KHP4SXHm_BSCowhgsa4a2B0xmiOUMyuHX2nfqVwRilnvd1grx55EWuJqO0VzbtuG-16TpsDTUywp0o",
"source_link":"https://www.researchgate.netNone",
"publication_type":"Chapter",
"publication_date":"Mar 2022",
"publication_doi":"DOI: 10.4324/9781003170884-10",
"publication_isbn":"ISBN: 9781003170884",
45 views16:14
Open / Comment
2022-06-02 18:15:34
61 views15:15
Open / Comment
2022-06-02 18:14:37 Air traffic (ADSB data from my house) over the Seattle area - April 2022

https://redd.it/v39zn6
@datascientology
61 views15:14
Open / Comment
2022-06-02 05:15:00
[OC] I've been working on a GUI project for data visualization. Here is what I've got so far. (help me decide)

/r/dataisbeautiful
https://redd.it/v2e01r
12 views02:15
Open / Comment
2022-06-02 04:14:41
What if "Did Not Vote" were a political candidate during the 2020 presidential election?

/r/MapPorn
https://redd.it/v2fbpt
17 views01:14
Open / Comment