Updates
A Reflection Thus Far
7/22/18
I’ve learned a lot of valuable things during my time at this REU. I was exposed to an entire field of computer science that I was previously unfamiliar with. My experience and beginner’s look into machine learning and data mining have opened my eyes to an area that I hadn’t had much knowledge of before. I feel like this has started to point me in the direction of pursuing this area more in my academic and professional career. I’d like to find some online classes to supplement my learning while I go through my undergraduate requirements. I think that taking that initiative could be valuable as I move forward. I also feel very good about the people I have met during my time here. I enjoy all the members of the REU team while working and while out exploring the city. I think we’ve had a fair share of really good weekends and fun times while we’ve been working on our research. I also have enjoyed working with the staff at George Mason, they’ve been very welcoming and helpful while I’ve been stumbling through my first experience at primary research. They’ve been very encouraging in beginning my undergraduate research.
This past week my partner and I have been finishing up our results and have been working towards getting our poster finished for presentation. This is another skill that I’ve been able to work on, being able to present my work in an academic environment. I had a brief exposure to this last semester, but this research is much more in depth. Along with working on my own, I also had the opportunity to see other academic’s posters when at the Big Data PI conference which was also a very helpful experience. Overall, I’m very happy to have participated in this REU site and I’m glad to have been chosen to be a part of this team.
7/22/18
I’ve learned a lot of valuable things during my time at this REU. I was exposed to an entire field of computer science that I was previously unfamiliar with. My experience and beginner’s look into machine learning and data mining have opened my eyes to an area that I hadn’t had much knowledge of before. I feel like this has started to point me in the direction of pursuing this area more in my academic and professional career. I’d like to find some online classes to supplement my learning while I go through my undergraduate requirements. I think that taking that initiative could be valuable as I move forward. I also feel very good about the people I have met during my time here. I enjoy all the members of the REU team while working and while out exploring the city. I think we’ve had a fair share of really good weekends and fun times while we’ve been working on our research. I also have enjoyed working with the staff at George Mason, they’ve been very welcoming and helpful while I’ve been stumbling through my first experience at primary research. They’ve been very encouraging in beginning my undergraduate research.
This past week my partner and I have been finishing up our results and have been working towards getting our poster finished for presentation. This is another skill that I’ve been able to work on, being able to present my work in an academic environment. I had a brief exposure to this last semester, but this research is much more in depth. Along with working on my own, I also had the opportunity to see other academic’s posters when at the Big Data PI conference which was also a very helpful experience. Overall, I’m very happy to have participated in this REU site and I’m glad to have been chosen to be a part of this team.
NSF Big Data Conference
6/20/18
The NSF Big Data Conference included a myriad of projects across disciplines all rooted in the use of Big Data. One presentation and poster I found interesting was titled: Population Reproduction of Poverty at Birth from Surveys, Censuses, and Birth Registrations. Their research was to study what goes into children being born at or below the poverty line. This was important because being born into a low income household is a lifelong barrier to success. They studied the different factor that play into a child being born into poverty which includes the parents' level of education, the matching of adults that have children, and the parents' race, ethnicity, and immigration status. From these different things that found that African American and Hispanic children were more at risk of being born into poverty than all other races in the US. They also found that women are marrying into higher classes had a very large decrease in the last 20 years. Instead, men are marrying into higher classes increasingly. This was explained by women seeking a higher education at an increasing rate therefore contributing to a higher income, whereas comparatively men are dropping out or not enrolling in high education increasingly.
6/20/18
The NSF Big Data Conference included a myriad of projects across disciplines all rooted in the use of Big Data. One presentation and poster I found interesting was titled: Population Reproduction of Poverty at Birth from Surveys, Censuses, and Birth Registrations. Their research was to study what goes into children being born at or below the poverty line. This was important because being born into a low income household is a lifelong barrier to success. They studied the different factor that play into a child being born into poverty which includes the parents' level of education, the matching of adults that have children, and the parents' race, ethnicity, and immigration status. From these different things that found that African American and Hispanic children were more at risk of being born into poverty than all other races in the US. They also found that women are marrying into higher classes had a very large decrease in the last 20 years. Instead, men are marrying into higher classes increasingly. This was explained by women seeking a higher education at an increasing rate therefore contributing to a higher income, whereas comparatively men are dropping out or not enrolling in high education increasingly.
REU Data Hackathon
6/14/18
Our REU team participated in a data hackathon. We were presented with a data set of 800 data points that included different attributes that contributed to the good or bad quality of wine. From that we were allowed to explore any program or code that would get us the highest classification accuracy.
My approach to this problem was to use Weka, which is software for machine learning written in Java. In this I imported the csv file and began to try different classification techniques on the entire training file. I found that the random forest algorithm was the most accurate approach to the dataset. Originally I tried to write a script for the algorithm, but I instead turned to testing the test file against the trained model inside of Weka. I eventually got the results to print into a separate csv file where I was then able to convert the results to 1s and 0s. 1 meant the quality was good and 0 meant that the quality was bad. When I submitted the results I got a 75% accuracy. The Weka output had predicted an 80-82% accuracy so it wasn't too far off and hadn't over-fitted much if at all.
6/14/18
Our REU team participated in a data hackathon. We were presented with a data set of 800 data points that included different attributes that contributed to the good or bad quality of wine. From that we were allowed to explore any program or code that would get us the highest classification accuracy.
My approach to this problem was to use Weka, which is software for machine learning written in Java. In this I imported the csv file and began to try different classification techniques on the entire training file. I found that the random forest algorithm was the most accurate approach to the dataset. Originally I tried to write a script for the algorithm, but I instead turned to testing the test file against the trained model inside of Weka. I eventually got the results to print into a separate csv file where I was then able to convert the results to 1s and 0s. 1 meant the quality was good and 0 meant that the quality was bad. When I submitted the results I got a 75% accuracy. The Weka output had predicted an 80-82% accuracy so it wasn't too far off and hadn't over-fitted much if at all.
What is Research?
6/6/18
Research looks like the development of a question and seeing the question through to some sort of outcome. Usually a question starts out very broad and becomes more narrowed as certain topics or themes become apparent through literature research. The questions targets an area not previously developed in order to add something to academic conversation or to prove a scientific point. Research comes in many different forms depending on what area the research is being used for. Scientific research looks more like scientists in a lab, what people tend to think of when they think of research. Other types involve creating possible solutions to questions through drawing conclusions from accepted research. Still others work to dispute or confirm claims from other researchers.
Research helps establish connections between data and its impact on society. The analytical side of research involves seeing its overall context. It comprises the how, what, and most importantly why. A researcher must emphasize why the research done is important and why people should care. Not every why is a huge impact, but there must be a reason why the research was conducted.
There are different types of research as well. Primary research involves running tests or collecting data first hand. This could be in a laboratory setting, running tests on a computer, or conducting interviews with other people. This type of research needs to have verifiable data. Things that are experiments should be done enough times to get consistent results. Research involving groups of people should have a large enough sample size that the results reflect a majority of people. Smaller samples sizes could possibly skew data.
Another type is secondary research which involves taking already established data, such as trials or other published academic papers, and then drawing conclusions from the already collected data. This could be to point out contentions in arguments or to solidly conclude similarities in research. This type of research can also be responsible for recreating experiments to confirm the results. A lot of academic paper rely on this type of research mixed with primary research. With both, it becomes easy to analyze another researcher’s conclusions and use their own data to confirm or dispute the findings.
The “end” to a research question is not always a solid, fixed answer. Research sometimes creates more questions than any definitive outcome. Every research question is merely a stepping stone to another idea, related or otherwise. Research is an investigation of new or old questions. It is a systematic process that involves the collection of data and some sort of analysis of that data. The process could uncover things that were never thought of by original researchers. It could also find over laps between established and new data but could also find differences. Research tends to become a very circular process as much of the time the research ends up needing further research, thus starting the process all over again.
6/6/18
Research looks like the development of a question and seeing the question through to some sort of outcome. Usually a question starts out very broad and becomes more narrowed as certain topics or themes become apparent through literature research. The questions targets an area not previously developed in order to add something to academic conversation or to prove a scientific point. Research comes in many different forms depending on what area the research is being used for. Scientific research looks more like scientists in a lab, what people tend to think of when they think of research. Other types involve creating possible solutions to questions through drawing conclusions from accepted research. Still others work to dispute or confirm claims from other researchers.
Research helps establish connections between data and its impact on society. The analytical side of research involves seeing its overall context. It comprises the how, what, and most importantly why. A researcher must emphasize why the research done is important and why people should care. Not every why is a huge impact, but there must be a reason why the research was conducted.
There are different types of research as well. Primary research involves running tests or collecting data first hand. This could be in a laboratory setting, running tests on a computer, or conducting interviews with other people. This type of research needs to have verifiable data. Things that are experiments should be done enough times to get consistent results. Research involving groups of people should have a large enough sample size that the results reflect a majority of people. Smaller samples sizes could possibly skew data.
Another type is secondary research which involves taking already established data, such as trials or other published academic papers, and then drawing conclusions from the already collected data. This could be to point out contentions in arguments or to solidly conclude similarities in research. This type of research can also be responsible for recreating experiments to confirm the results. A lot of academic paper rely on this type of research mixed with primary research. With both, it becomes easy to analyze another researcher’s conclusions and use their own data to confirm or dispute the findings.
The “end” to a research question is not always a solid, fixed answer. Research sometimes creates more questions than any definitive outcome. Every research question is merely a stepping stone to another idea, related or otherwise. Research is an investigation of new or old questions. It is a systematic process that involves the collection of data and some sort of analysis of that data. The process could uncover things that were never thought of by original researchers. It could also find over laps between established and new data but could also find differences. Research tends to become a very circular process as much of the time the research ends up needing further research, thus starting the process all over again.