Data Scientology

Channel address:

Categories: Uncategorized

Language: English

Subscribers: 1.26K

Description from channel

Hot data science related posts every hour. Chat: https://telegram.me/r_channels
Contacts: @lgyanf

▲ Vote (1)

Ratings & Reviews

1.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 7

2022-06-03 01:14:44

[OC] Gun Death Rates by State

/r/dataisbeautiful
https://redd.it/v3cwh6

54 views22:14

Open / Comment

2022-06-02 23:14:53

[OC] Web browsers over the last 28 years

/r/dataisbeautiful
https://redd.it/v3c9nv

58 views20:14

Open / Comment

2022-06-02 21:14:46 Looking for US max high/low temperatures by zip code

Trying to identify zip codes that had temperatures above X in the last 12 months. I’m seeing datasets where I can download historical data for individual zip codes, but having trouble finding a data set that allows me to identify zip codes that have been known to rise above certain thresholds. Thanks for your help.

/r/datasets
https://redd.it/uyho41

62 views18:14

Open / Comment

2022-06-02 20:14:42 Project BFLOAT16 on ALL hardware (>= 2009), up to 2000x faster ML algos, 50% less RAM usage for all old/new hardware - Hyperlearn Reborn.

Hello everyone!! It's been a while!! Years back I released Hyperlearn https://github.com/danielhanchen/hyperlearn. It has 1.2K Github stars, where I made tonnes of algos faster.

I was a bit busy back at NVIDIA and my startup, and I've been casually developing some algos. The question is are people still interested in fast algorithms? Does anyone want to collaborate on reviving Hyperlearn? (Or making a NEW package?) Note the current package is ahhh A MESSS... I'm fixing it - sit tight!!

NEW algos for release:

1. PCA with 50% less memory usage with ZERO data corruption!! (Maths tricks :)) (ie no need to do X - X.mean()!!!)) How you may ask???!
2. Randomized PCA with 50% less memory usage (ie no need to do X - X.mean()).
3. Linear Regression is EVEN faster with now Pivoted Cholesky making algo 100% stable. No package on the internet to my knowledge has pivoted cholesky solvers.
4. Bfloat16 on ALL hardware all the way down to SSE4!!! (Intel Core i7 2009!!)
5. Matrix multiplication with Bfloat16 on ALL hardware/?ASD@! Not the cheap 2x extra memory copying trick - true 0 extra RAM usage on the fly CPU conversion.
6. New Paratrooper Optimizer which trains neural nets 50% faster using the latest fast algos.
7. Sparse blocked matrix multiplication on ALL hardware (NNs) !!
8. Super fast Neural Net training with batched multiprocessing (ie when NN is doing backprop on batch 1, we load batch 2 already etc).
9. Super fast softmax making attention softmax(Q @ K.T / sqrt(d))V super fast and all operations use the fastest possible matrix multiplciation config (tall skinny, square matrices)
10. AND MORE!!!

Old algos made faster:

1. 70% less time to fit Least Squares / Linear Regression than sklearn + 50% less memory usage
2. 50% less time to fit Non Negative Matrix Factorization than sklearn due to new parallelized algo
3. 40% faster full Euclidean / Cosine distance algorithms
4. 50% less time LSMR iterative least squares
5. 50% faster Sparse Matrix operations - parallelized
6. RandomizedSVD is now 20 - 30% faster

Also you might remember my 50 page machine learning book: https://drive.google.com/file/d/18fxyBiPE0G4e5yixAj5S--YL\_pgTh3Vo/view?usp=sharing

https://preview.redd.it/vmmiocvvk7391.png?width=1793&format=png&auto=webp&s=d2c26b4f2fbbfcd007b44d528579a271ea7960cc

/r/MachineLearning
https://redd.it/v38pwm

58 views17:14

Open / Comment

2022-06-02 19:14:52 "authors":[
"Jakob Svensson"
]
}, ... other publications
{
"title":"Coffee In Chhattisgarh", # last publication
"link":"https://www.researchgate.netpublication/353118247_COFFEE_IN_CHHATTISGARH?_sg=CsJ66DoWjFfkMNdujuE-R9aVTZA4kVb_9lGiy1IrYXls1Nur4XFMdh2s5E9zkF5Skb5ZZzh663USfBA",
"source_link":"https://www.researchgate.netNone",
"publication_type":"Technical Report",
"publication_date":"Jul 2021",
"publication_doi":null,
"publication_isbn":null,
"authors":[
"Krishan Pal Singh",
"Beena Nair Singh",
"Dushyant Singh Thakur",
"Anurag Kerketta",
"Shailendra Kumar Sahu"
]
}
]
```

A step-by-step explanation at SerpApi: https://serpapi.com/blog/web-scraping-all-researchgate-publications-in-python/#code-explanation

/r/datasets
https://redd.it/v2bx0w

54 views16:14

Open / Comment

2022-06-02 19:14:49 [Script] Scraping ResearchGate all Publications

```python
from parsel import Selector
from playwright.sync_api import sync_playwright
import json

def scrape_researchgate_publications(query: str):
with sync_playwright() as p:

browser = p.chromium.launch(headless=True, slow_mo=50)
page = browser.new_page(user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36")

publications = []
page_num = 1

while True:
page.goto(f"https://www.researchgate.net/search/publication?q={query}&page={page_num}")
selector = Selector(text=page.content())

for publication in selector.css(".nova-legacy-c-card__body--spacing-inherit"):
title = publication.css(".nova-legacy-v-publication-item__title .nova-legacy-e-link--theme-bare::text").get().title()
title_link = f'https://www.researchgate.net{publication.css(".nova-legacy-v-publication-item__title .nova-legacy-e-link--theme-bare::attr(href)").get()}'
publication_type = publication.css(".nova-legacy-v-publication-item__badge::text").get()
publication_date = publication.css(".nova-legacy-v-publication-item__meta-data-item:nth-child(1) span::text").get()
publication_doi = publication.css(".nova-legacy-v-publication-item__meta-data-item:nth-child(2) span").xpath("normalize-space()").get()
publication_isbn = publication.css(".nova-legacy-v-publication-item__meta-data-item:nth-child(3) span").xpath("normalize-space()").get()
authors = publication.css(".nova-legacy-v-person-inline-item__fullname::text").getall()
source_link = f'https://www.researchgate.net{publication.css(".nova-legacy-v-publication-item__preview-source .nova-legacy-e-link--theme-bare::attr(href)").get()}'

publications.append({
"title": title,
"link": title_link,
"source_link": source_link,
"publication_type": publication_type,
"publication_date": publication_date,
"publication_doi": publication_doi,
"publication_isbn": publication_isbn,
"authors": authors
})

print(f"page number: {page_num}")

# checks if next page arrow key is greyed out `attr(rel)` (inactive) and breaks out of the loop
if selector.css(".nova-legacy-c-button-group__item:nth-child(9) a::attr(rel)").get():
break
else:
page_num += 1

print(json.dumps(publications, indent=2, ensure_ascii=False))

browser.close()

scrape_researchgate_publications(query="coffee")
```

Outputs:

```json
[
{
"title":"The Social Life Of Coffee Turkey’S Local Coffees",
"link":"https://www.researchgate.netpublication/360540595_The_Social_Life_of_Coffee_Turkey%27s_Local_Coffees?_sg=kzuAi6HlFbSbnLEwtGr3BA_eiFtDIe1VEA4uvJlkBHOcbSjh5XlSQe6GpYvrbi12M0Z2MQ6grwnq9fI",
"source_link":"https://www.researchgate.netpublication/360540595_The_Social_Life_of_Coffee_Turkey%27s_Local_Coffees?_sg=kzuAi6HlFbSbnLEwtGr3BA_eiFtDIe1VEA4uvJlkBHOcbSjh5XlSQe6GpYvrbi12M0Z2MQ6grwnq9fI",
"publication_type":"Conference Paper",
"publication_date":"Apr 2022",
"publication_doi":null,
"publication_isbn":null,
"authors":[
"Gülşen Berat Torusdağ",
"Merve Uçkan Çakır",
"Cinucen Okat"
]
},
{
"title":"Coffee With The Algorithm",
"link":"https://www.researchgate.netpublication/359599064_Coffee_with_the_Algorithm?_sg=3KHP4SXHm_BSCowhgsa4a2B0xmiOUMyuHX2nfqVwRilnvd1grx55EWuJqO0VzbtuG-16TpsDTUywp0o",
"source_link":"https://www.researchgate.netNone",
"publication_type":"Chapter",
"publication_date":"Mar 2022",
"publication_doi":"DOI: 10.4324/9781003170884-10",
"publication_isbn":"ISBN: 9781003170884",

45 views16:14

Open / Comment

2022-06-02 18:15:34

61 views15:15

Open / Comment

2022-06-02 18:14:37 Air traffic (ADSB data from my house) over the Seattle area - April 2022

https://redd.it/v39zn6
@datascientology

61 views15:14

Open / Comment

2022-06-02 05:15:00

[OC] I've been working on a GUI project for data visualization. Here is what I've got so far. (help me decide)

/r/dataisbeautiful
https://redd.it/v2e01r

12 views02:15

Open / Comment

2022-06-02 04:14:41

What if "Did Not Vote" were a political candidate during the 2020 presidential election?

/r/MapPorn
https://redd.it/v2fbpt

17 views01:14

Open / Comment

Data Scientology

Ratings & Reviews

The latest Messages 7

Popular Channels

Related Chats

Popular Channels

Login