[SOLVED]Data miner thinking of contributing - some questions

This forum is dedicated to people who consider becoming contributors :
-- Packaging & translating,
-- Support,
-- Spreading Mageia,
-- And many others options you have to help.

[SOLVED]Data miner thinking of contributing - some questions

Postby vulix » May 28th, '12, 01:45

Hello everyone,

I noticed the contribution page included the role of data mining. I'm actually a PhD student over at the University of Arizona, and my research interests fall under machine learning and data mining. Speifically, I program a lot in Java, a bit in python, and do a lot of stats work using R. I'm also quite familiar with many data visualization tool. Some stuff I've done personally are clustering and classification through the WEKA API, sentiment analysis, and some basic regression models. I'd love to try and contribute to any data mining projects!

Mandrake linux was the very first Linux distribution I tried back in 2001, leading me to try various flavors of Linux over the next few years. Ultimately I settled on the Windows platform, but recently I have been having to use Linux again frequently due to specific scientific programs only compiled for Unix. I thought it would be a wonderful time to revisit my Linux roots and try to get involved in a small community to help out :D

Some questions:

1. What kind of time committment is needed from someone contributing in data mining?

2. Is work bound to regular release cycles?

3. What type of projects are we looking at?

4. Who do I talk too to start getting involed?

Thanks very much =)
Last edited by vulix on May 28th, '12, 19:18, edited 1 time in total.
vulix
 
Posts: 4
Joined: May 28th, '12, 01:30

Re: Data miner thinking of contributing - some questions

Postby rda » May 28th, '12, 15:19

Hi and welcome!

1. What kind of time committment is needed from someone contributing in data mining?

Everything is yet to build.

2. Is work bound to regular release cycles?

No. That's up to the people doing it to decide.

3. What type of projects are we looking at?

As above, this is still more a wish to develop this work within Mageia, a feeling that we might have some interesting things to do here; but we need people more acute and dedicated to that. So it's very open. Come with an hypothesis to test, and let's see if we can find and gather the data about it. Or we can make an inventory of all we can log and gather for this team and others to build on (somehow related to https://bugs.mageia.org/show_bug.cgi?id=4034 for the infrastructure).

Among things we could do is (each may need either to just use existing logs, or setting these up):
  • we don't know what packages are really downloaded (or used), and we don't know very well (we don't show it anywhere) the layout of packages (for instance, are there big/loaded packages that are not significantly used and are still distributed on ISOs? what are the packages that get the most updates, or none? for a given packages, what are the activity with it? or what are the main (in)active packages [per install, per use, per update]? what are the usage patterns [mostly desktop, mostly web, in between]? or maybe something else). Knowledge or a better insight about it could help for building a more focused release media or experience (or a totally different one).
  • we don't know much yet about who is using Mageia (age, sex, language), where (country) on what type of device (desktop, server, mobile, tactile or not) and what for (home, work, other); part of that is more about surveys, part of that more about logs (that will need to be properly designed not to be intrusive regarding our privacy policy).
  • how do people interact within the project, on the forum and on the mailing-lists; are there different groups? do they map or differ from teams in the project? can we map specific roles from their interactions within the project?
  • how people talk about the project (or themselves as a community) within and outside of the project?
  • how is perceived/talked about the project by outsiders, out there in the Internet?
  • and so on.

So you see this is still very broad. One should pick one subject, focus on it and see how to implement it lightly and demonstrate its benefits (better, useful knowledge) to the project and the community. And then move on an other subject.

4. Who do I talk too to start getting involed?

I can help you find your way in the project and set things up. You could ping bmahe on IRC too - or I can give you his email. And we can arrange and discuss something for a start.
User avatar
rda
 
Posts: 20
Joined: Mar 16th, '11, 16:47
Location: Nantes, France

Re: Data miner thinking of contributing - some questions

Postby vulix » May 28th, '12, 16:58

Thanks for the reply, and that sounds great. I'll try to find you on IRC (and bmahe) and talk more in depth about potential projects and what you would prefer prioritized. I also need to figure out what data you are currently logging and what you aren't, but as long as the data is stored somewhere, we can do something with it =) Thanks for replying!
vulix
 
Posts: 4
Joined: May 28th, '12, 01:30

Re: Data miner thinking of contributing - some questions

Postby doktor5000 » May 28th, '12, 18:29

Mind to mark the thread as [SOLVED] as you initial questions are answered so far? Helps to keep up the clarity ;)
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 17659
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: Data miner thinking of contributing - some questions

Postby vulix » May 28th, '12, 19:18

doktor5000 wrote:Mind to mark the thread as [SOLVED] as you initial questions are answered so far? Helps to keep up the clarity ;)


Done =)
vulix
 
Posts: 4
Joined: May 28th, '12, 01:30


Return to Basic questions about contribution

Who is online

Users browsing this forum: No registered users and 1 guest

cron