Need data for a new model

#1
by Reality123b - opened
DataMuncher Labs org

Would be nice if you helped me gather required data (10B tokens) for training a model on most of the internet knowledge etc.
You can maek it muktimodal if you want it to

DataMuncher Labs org

Ok, so how large of a model is it (paramaters)

DataMuncher Labs org

Yk that would take ~19 days of training (19tflops, 500m p, 10b tokens) but sure, i can infact provide that, just give me what kinda data stuff u need :D

DataMuncher Labs org

I need text data and the model is 200m (I know, overfitting to some extent, but hey, this is a sparse MoE) parameters. I got way more than 19TFLOPS.
And thanks for being a good person to help me find that!!!
And it is for a general purpose AI covering coding and stuff like whatever is defined as a general purpose AI.

DataMuncher Labs org

How do you want me to give it to you?

DataMuncher Labs org

In this discussion or by making a dataset.

DataMuncher Labs org

I do have 5BLN tokens of data rn, is that fine?

DataMuncher Labs org

Yes

DataMuncher Labs org

any updates so far?

DataMuncher Labs org

mb for vanishing, i was working on a new web crawler (i am NOT baby sitting it for ~12 hrs) If you need quick data, download the large english wikidump, and use wikiextractor (human written)

Roman190928 changed discussion status to closed
Roman190928 changed discussion status to open
DataMuncher Labs org

alright thanks!

DataMuncher Labs org

Did it work? :D

DataMuncher Labs org

Well, i couldn't find it lol.

DataMuncher Labs org

ok

Sign up or log in to comment