Hi and welcome!
1. What kind of time committment is needed from someone contributing in data mining?
Everything is yet to build.
2. Is work bound to regular release cycles?
No. That's up to the people doing it to decide.
3. What type of projects are we looking at?
As above, this is still more a wish to develop this work within Mageia, a feeling that we might have some interesting things to do here; but we need people more acute and dedicated to that. So it's very open. Come with an hypothesis to test, and let's see if we can find and gather the data about it. Or we can make an inventory of all we can log and gather for this team and others to build on (somehow related to
https://bugs.mageia.org/show_bug.cgi?id=4034 for the infrastructure).
Among things we could do is (each may need either to just use existing logs, or setting these up):
- we don't know what packages are really downloaded (or used), and we don't know very well (we don't show it anywhere) the layout of packages (for instance, are there big/loaded packages that are not significantly used and are still distributed on ISOs? what are the packages that get the most updates, or none? for a given packages, what are the activity with it? or what are the main (in)active packages [per install, per use, per update]? what are the usage patterns [mostly desktop, mostly web, in between]? or maybe something else). Knowledge or a better insight about it could help for building a more focused release media or experience (or a totally different one).
- we don't know much yet about who is using Mageia (age, sex, language), where (country) on what type of device (desktop, server, mobile, tactile or not) and what for (home, work, other); part of that is more about surveys, part of that more about logs (that will need to be properly designed not to be intrusive regarding our privacy policy).
- how do people interact within the project, on the forum and on the mailing-lists; are there different groups? do they map or differ from teams in the project? can we map specific roles from their interactions within the project?
- how people talk about the project (or themselves as a community) within and outside of the project?
- how is perceived/talked about the project by outsiders, out there in the Internet?
- and so on.
So you see this is still very broad. One should pick one subject, focus on it and see how to implement it lightly and demonstrate its benefits (better, useful knowledge) to the project and the community. And then move on an other subject.
4. Who do I talk too to start getting involed?
I can help you find your way in the project and set things up. You could ping bmahe on IRC too - or I can give you his email. And we can arrange and discuss something for a start.