Get Mystery Box with random crypto!

Data Scientology

Logo of telegram channel datascientology — Data Scientology D
Logo of telegram channel datascientology — Data Scientology
Channel address: @datascientology
Categories: Uncategorized
Language: English
Subscribers: 1.26K
Description from channel

Hot data science related posts every hour. Chat: https://telegram.me/r_channels
Contacts: @lgyanf

Ratings & Reviews

1.67

3 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

0

4 stars

0

3 stars

0

2 stars

2

1 stars

1


The latest Messages 122

2021-12-17 18:15:19
Understanding Candlesticks [Infographic]

/r/Infographics
https://redd.it/qpkjk7
62 views15:15
Open / Comment
2021-12-17 17:14:51 How do you go about coming up with ideas for personal projects?

I'm hoping to take some time over the holidays to start a new and (hopefully) interesting personal project. I've never tried to create a project outside of an academic setting before, and I'm a little lost on where/how to start coming up with ideas.

A bit of Googling showed the same few project ideas coming up (credit card fraud detection, road sign classification, etc) and ideally I'd like a fresh take rather than a recycled idea.

I'm hoping that this would be a good chance to learn/practice new tools or skills that would be helpful in either academic or professional settings (I'm an undergrad statistics student) and have something nice to add to my Github as a bonus.

If anyone has tips or suggestions, I'd really appreciate it. Thanks in advance!

/r/datascience
https://redd.it/ri49bx
71 views14:14
Open / Comment
2021-12-17 16:14:45
[OC] After my girlfriend had kidney stones last year (with severe and painful crises), I've started tracking her water consumption daily by a simple form. [she can convert it into gifts, with the 4th bottles valuing exponentially more than the first]

/r/dataisbeautiful
https://redd.it/rie9wi
66 views13:14
Open / Comment
2021-12-17 15:14:58 [Casual] Choose your favorite Japanese anime from the list (anyone)
https://forms.gle/vqiwzQW3n1Kt6kFQ7

/r/SampleSize
https://redd.it/ri7e7l
27 views12:14
Open / Comment
2021-12-17 13:15:07
[OC] LatAm's giants are being replaced by new players.

/r/dataisbeautiful
https://redd.it/rhuedn
42 views10:15
Open / Comment
2021-12-17 12:15:01
Countries that have at one point had a colony in the Americas

/r/MapPorn
https://redd.it/ri0abp
53 views09:15
Open / Comment
2021-12-17 11:15:04
[OC] Fasten your seatbelts: Where flight turbulence is going to increase thanks to climate change

/r/dataisbeautiful
https://redd.it/rhqxno
53 views08:15
Open / Comment
2021-12-17 10:14:40 Does high pay = harder work, longer hours?

How many of you are making 110k+, working 30-40 hrs a week, and generally have a low stress job?

I've got a cushy job. I work about 35 hrs/wk, managing 3 analysts who do excel and SQL+Tableau. I make 75k, low cost of living area, fully remote, unlimited PTO. I could actually do my job passably in 20 hrs a week--only my pride and desire to advance keeps me working.

I've got a Master's in Analytics, and could start down a path of data science "proper"-- building and deploying predictive models, building SWE skills, etc. But my work+life balance rocks. I'm afraid to give up this job and then never find another like it.

With 6 YOE, management experience, and a MS, I could easily make 6 figures somewhere. What are the odds that if I switch jobs a couple times, I'll eventually find something like what I have now, but with better pay?

Would I be crazy to leave what I have?

Edit: thanks for the comments, please keep them coming. Thus far, Mostly people telling me that it is doable--you CAN have it all. Dissenting opinions welcomed.

/r/datascience
https://redd.it/ri6sa6
55 views07:14
Open / Comment
2021-12-17 09:14:59 Training 100000+ models parallely using MLLib in PySpark

Big data noob here - please don't judge :)

I am developing a forecast module for a portfolio of 4000+ products (the count of products will increase as we expand the portfolio). We are currently experimenting with random forest & XGBoost models as an alternative to traditional time series models.

For each product, the model is first trained on the historical data and future predictions are generated. Is it possible to parallelize the process of modeling across multiple products through any functionality in PySpark?

I came across the usage of pandas udf for a similar use case but I would like to know if there's a better approach out there. I did come across a similar post on StackOverflow but it is in Scala language.

Any help is highly appreciated!

/r/bigdata
https://redd.it/r6cli9
49 views06:14
Open / Comment
2021-12-17 08:14:38 Discussion Do undergrads really understand statistical inference?



I posted this on an econometrics subreddit, but I want to also ask the more general statistics community. I hope that's okay! Note, I'm not an instructor, but starting my Ph.D. soon it hopes of teaching econometrics!

For Students: When you took your first statistical inference course, did you REALLY understand statistical inference by end of the course?

And for professors: Did you think the average student truly understands statistical inference?

By "really" and "truly", I'm not talking about being able to robotically calculate statistics: t-stat, p-value, CI and knowing when to reject and fail to reject the null... the stuff that gets you an A+ in Stats 101 if you can repeat them on the test. I got an A+ in intro statistics because we had past finals/midterms and I just "memorized" how to do it on the exam. The exam was literally the same as a practice midterm just different numbers/scenarios.

I'm talking about the bigger picture of statistical inference. If you asked your average introductory statistics student (or even just the A+ students) the more philosophical and epistemological questions, would they be able to give a confident answer?

1. Describe the big picture of statistical inference, what is it, what is it trying to do?
2. What does it mean that the sample estimate has a distribution? What does it mean that estimator is a random variable itself? Why is this a problem?
3. What is a hypothesis test doing? Why are you undertaking it? Why do people do it? What information does it give you that you didn't know before? What does it mean that you're assuming something to be true? What does it mean about your assumption when you reject/fail-to-reject the null?
4. What does it mean that you do not know the parameter and can never truly know what it is? How does affect your ability to understand how some random process works?

In my experience, I feel like most students get lost in learning the methods of calculations, they neglect to appreciate the bigger picture. It's like a student getting an A+ in Calculus because they can apply all the derivative rules perfectly but somehow never really understood that a derivative is the rate of change.

Most of the stuff intro stats students are evaluated on are things a computer can do for them. They are neglected or not truly tested to understand the epistemological value of knowing how statistical inference works.

/r/statistics
https://redd.it/ri7nmv
52 views05:14
Open / Comment