# (PyCon 2014 Video) How To Get Started with Machine Learning - Melanie Warrick's PyCon 2014 Talk

**Save $500 on the machine learning course for developers (May 10-11, 2014 at Hackbright Academy). Use promo code "HB-BLOG" to save $500 when you register for the Hackbright machine learning course here.**

Melanie Warrick, data scientist and software engineer for hire, Zipfian and Hackbright graduate, gives a talk at the annual PyCon 2014 conference.

Watch her full PyCon 2014 talk on machine learning:

*Arthur Samuel defined machine learning as a “field of study that gives computers the ability to learn without being explicitly programmed”.* Machine learning is about applying algorithm(s) in a program to solve the problem you are faced with and address the type of data that you have. You create a model that will help conduct pattern matching and/or predict results. Then you evaluate the model and iterate on it as needed to create the right type of solution for the problem.

Examples of machine learning (ML) in the real world include handwritten analysis, which uses neural nets to read millions of mail regularly to sort and classify all the different variations in written addresses. Weather prediction, fraud detection, search, facial recognition, and so forth are all examples of machine learning in the wild.

### Algorithms

There are several types of ML algorithms to choose from and apply to a problem - some are listed below and are broken into categories to give an approach on how to think about applying them. When choosing an algorithm, it's important to think about the goal/problem, the type of data available and the time and effort that you have to work on the solution.

### Machine Learning Algorithms

A couple starting points to consider are - whether the data is unsupervised or supervised. Supervised is whether you have actual data that represent the results you are targeting in order to train the model. Spam filters are built on actual data that have been labeled as spam while unsupervised data doesn’t have a clear picture of the result. For unsupervised learning, there will be questions about the data and you can run algorithms on it to see if patterns emerge that help tell a story. Unsupervised is a challenging type of approach and typically there isn’t necessarily a “right” answer for the solution.

In addition, if the data is continuous (e.g. height, weight) or categorical/discrete (e.g. male/female, Canadian/American) that helps determine the type of algorithm to apply. Basically its about whether the data has a set amount of units that can be defined or if the variations in the data are nearly infinite. These are some ways to evaluate what you have to help identify an approach to solve the problem.

Note, the algorithms categorization has been simplified a bit to help provide context, but some of the algorithms do cross the above boundaries (i.e. linear regression).

### Models

Once you have the data and an algorithmic approach, you can work on building a model. A model can be something as simple as an equation for a line (y=mx+b) or as complex as a neural net with many layers and nodes.

Linear Regression is a machine learning algorithm and a simple one to start with, where you find the best fit line to represent observed data. In the talk, I showed two different examples of having observed data that exhibited some type of linear trend. There was a lot of noise (data was scattered around the graph), but there was enough of a trend to demo linear regression.

When building a model with linear regression, you want to find the most optimal slope (m) and intercept (b) based on the actual data. See, algebra is actually applicable in the real world. This is a simpl algorithm to calculate the model yourself, but it's better to leverage tools like scikit-learn’s library to help you more efficiently calculate the best fit line. What you are calculating is a line that minimizes the distance between all the observed data points.

After generating a model, evaluate the performance and iterate to improve the model as needed if it is not performing as expected. I have explained linear regression here.

### Prediction

When we have a good model, you can take in new data and output predictions. Those predictions can feed into some type of data product or generate results for a report or visualization.

In my presentation, I used actual head size and brain weight data to build a model that predicts brain weight based on head size. Since the data was fairly small, this decreases the predictive power and increases the potential for error in the model. I went with this data since it was a demo, and I wanted to keep it simple. When graphed, the observed data was spread out which also indicated error and a lot of variance in the data. So it predicts weight with a good amount of variance in the model.

With the linear model I built, I was able to apply it so that I could feed it a head size (x) and it would calculate the predicted brain weight (y). Other models are more complex regarding the underlying math and application. To see the full code solution, checkout the github repository as noted above. The script is written a little differently from the slides because I created functions for each of the major steps. Also, there is an iPython notebook that shows some of the drafts I worked through to build out the code for the presentation

### Tools

The Python stack is becoming popular for scientific computing because of the well-supported toolsets. Below is a list of key tools to start learning if you want to work with ML. There are many other Python libraries out there for more nuanced needs in the space as well as other stack packages to explore (R, Java, Julia).

If you are trying to figure out where to start, here are my recommendation:

• Scikit-Learn = machine learning algorithms

• Pandas = dataframe tool

• NumPy = matrix manipulation tool

• SciPy = stats models

• Matplotlib = visualization

### Skills

In order to work with ML algorithms and problems, it's important to build out your skill set regarding the following:

• Algorithms

• Statistics (probability, inferential, descriptive)

• Linear Algebra (vectors & matrices)

• Data Analysis (intuition)

• SQL, Python, R, Java, Scala (programming)

• Databases & APIs (get data)

### Resources

Below is a beginning list of resources to get you started.

I highly recommend Andrew Ng’s class and a couple of links are to sites with more recommendations on what to check out next:

• Andrew Ng’s Machine Learning on Coursera

• Khan Academy courses in linear algebra and stats

• “Think Stats” by Allen Downey

• Zipfian’s "A Practical Intro to Data Science"

• Metacademy

• Open Source Data Science Masters

• Stack Overflow, Data Tau, Kaggle

• Mentors

One point to note from this list and I stressed this in the talk - seek out mentors! They are out there and willing to help. You have to put it out there what you want to learn and then be aware when someone offers to help. Also, follow up. Don’t stalk the person but reach out to see if they will make a plan to meet you. They may only have an hour or they may give you more time than you expect. Just ask and if you don’t get a good response or have a hard time understanding what they share, don’t stop there. Keep seeking out mentors. They are an invaluable resource to get you much farther faster.

### Last Point to Note

ML is not the solution for everything and many times can be overkill. You have to look at the problem you are working on to determine what makes the most sense in regards to your solution and how much data you have available.

I highly recommend looking for the simple solution first before reaching for something more complex and time-consuming. Sometimes regex is the right answer and there is nothing wrong with that. As mentioned to figure out an approach, it's good to understand the problem, the data, the amount of data you have and timing to turn the solution around.

Good luck in your ML pursuit.

### References

These are the main references I used in putting together my talk and post:

• Zipfian

• Framed.io

• “Analyzing the Analyzers” – Harlan Harris, Sean Murphy, Marck Vaisman

• “Doing Data Science” – Rachel Schutt & Cathy O’Neil

• “Collective Intelligence” – Toby Segaran

• "Some Useful Machine Learning Libraries” (blog)

• University GPA Linear Regression Example

• Scikit-Learn (esp. linear regression)

• Mozy Blog

• StackOverflow

• Wiki

*This post was originally posted at Melanie Warrick's blog.*

**Save $500 on the machine learning course for developers (May 10-11, 2014 at Hackbright Academy). Use promo code "HB-BLOG" to save $500 when you register for the Hackbright machine learning course here.**

## RELATED POSTS

### Recent Posts

### CATEGORIES

- News (161)
- Hackbright Academy (123)
- Hackbright News (106)
- Profiles of Woman Engineers (104)
- Alum (97)
- Engineering Advice (68)
- tech (54)
- career change (53)
- Resources (49)
- TGIF (49)
- link roundup (49)
- reading (49)
- recap (49)
- roundup (49)
- weekly (49)
- women in tech (43)
- Becoming A Software Engineer (36)
- Software Engineer (36)
- female software engineers (36)
- diversity in tech (35)
- change the ratio (32)
- learn to code (32)
- Career Services (30)
- women who code (30)
- Admissions Office (28)
- #ilooklikeanengineer (27)
- Python (27)
- Student Blogs (25)
- Hackbright Field Trips (24)
- coding (24)
- female engineers (24)
- Hackbright Mentors (22)
- Thought Piece (21)
- Video (21)
- partner (20)
- diversity (18)
- Recruiting & Hiring (16)
- alumna (14)
- Software Engineers (13)
- hackbright (13)
- tech inclusion (13)
- how to become a software engineer (12)
- hired (11)
- Graduation (10)
- changetheratio (10)
- Tech Talk (9)
- software developer (9)
- Imposter Syndrome (8)
- hiring (8)
- Course Report (7)
- Facebook (7)
- GitHub (7)
- Mentorship (7)
- Scholarship (7)
- women engineers (7)
- Hackbright Alumnae Showcase (6)
- Programming Languages (6)
- coding bootcamp (6)
- Hackbright mentor (5)
- Liz Howard (5)
- Programming (5)
- Scholarships (5)
- mentor (5)
- recruiting (5)
- Admissions (4)
- Eventbrite (4)
- Full-Stack (4)
- Graduates (4)
- Hackathon (4)
- Heroku (4)
- Melanie Warrick (4)
- Mentors (4)
- Nicole Zuckerman (4)
- Resume (4)
- Silicon Chef (4)
- code (4)
- developer (4)
- holidays (4)
- instructor (4)
- women in computer science (4)
- Ada Lovelace Day (3)
- Chris Palmer (3)
- Code of Conduct (3)
- GoDaddy (3)
- Google (3)
- Halloween (3)
- Interview Tips (3)
- Jasmine Tsai (3)
- Job Search (3)
- Kathryn King (3)
- Mica Swyers (3)
- Negotiation (3)
- Pat Poels (3)
- STEM (3)
- Shilpa Dalmia (3)
- coding school (3)
- computer programming (3)
- day in the life (3)
- day-to-day (3)
- engineering (3)
- girls in tech (3)
- holidaze (3)
- international women's day (3)
- learning (3)
- part-time (3)
- podcast (3)
- prep (3)
- women in STEM (3)
- “Kelley Robinson” (3)
- ActivityHero (2)
- Aimee Morgan (2)
- Alyson La (2)
- Angie Chang (2)
- Arduino (2)
- Change.org (2)
- Elissa Murphy (2)
- Financing Options (2)
- Girls Who Code (2)
- Gowri Grewal (2)
- Gulnara Mirzakarimova (2)
- Hackathons (2)
- Inspiration (2)
- Joyce Park (2)
- Julia Grace (2)
- Julia Hartz (2)
- Kate Heddleston (2)
- Katherine Fellows (2)
- Katherine Hennes (2)
- Lisa Lee (2)
- Liz Crawford (2)
- Machine Learning (2)
- Marissa Mayer (2)
- Meagan Gamache (2)
- Megan Speir (2)
- Michelle Glauser (2)
- Michelle Sun (2)
- Natalie Downe (2)
- New Relic (2)
- Niniane Wang (2)
- Padmasree Warrior (2)
- Poornima Vijayashanker (2)
- Rebecca Bruggman (2)
- Security Engineering (2)
- Selina Tobaccowala (2)
- SheCodes (2)
- Siena Aguayo (2)
- SurveyMonkey (2)
- Tindie (2)
- Uber (2)
- Uncategorized (2)
- Web Apps (2)
- Zendesk (2)
- Zoe Kay (2)
- almnae (2)
- back-end (2)
- bootcamp (2)
- career transition (2)
- careers (2)
- coding blonde (2)
- computer programmer (2)
- ebook (2)
- engineer (2)
- financial aid (2)
- gender (2)
- gender gap (2)
- gift guide (2)
- hackbright prep (2)
- jobs after bootcamp (2)
- payment plans (2)
- product management (2)
- projects (2)
- python 101 (2)
- python programmers (2)
- reddit (2)
- social impact (2)
- students (2)
- techhire (2)
- technical product management (2)
- white house (2)
- women (2)
- #HackentinesDay (1)
- #hackdisrupt (1)
- #learntocode (1)
- 2017 (1)
- A Day In The Life Of A Hackbright Student (1)
- Academia.edu (1)
- Ada Lovelace (1)
- Adora Cheung (1)
- Affectiva (1)
- Alison Gianotto (1)
- Allison Deal (1)
- Ambassadors (1)
- Ambassadors Program (1)
- Anna Billstrom (1)
- AppJamming (1)
- Ashley Lorden (1)
- Automate Everything (1)
- Bay Area Girl Geek Dinners (1)
- BeMyApp Factory Hack (1)
- Belinda Runkle (1)
- Bessie Chu (1)
- Big O Notation (1)
- Bills.com (1)
- Birchbox (1)
- Black Girls Code (1)
- Blameless Work Culture (1)
- Brittany Martin (1)
- Browser Extension (1)
- Buffer (1)
- B’Elanna Torres (1)
- CODE Documentary (1)
- CODE2040 (1)
- CTO (1)
- CTOs (1)
- Cara Marie Bonar (1)
- Career Day (1)
- Cathy Edwards (1)
- Charmy Chhichhia (1)
- Chegg (1)
- Chomp (1)
- Christian Fernandez (1)
- Christina Liu (1)
- Christina Pan (1)
- Christine Yen (1)
- Cisco (1)
- Clare Corthell (1)
- Code.org (1)
- CodeGirl (1)
- CodeShannon (1)
- Computer Security (1)
- Conferences (1)
- Costumes (1)
- Couchsurfing (1)
- Cynthia Dueltgen (1)
- Danica McKellar (1)
- Data (1)
- Data Science (1)
- Dave-To-Girl (1)
- Dave-To-Girl Ratio (1)
- DevBeat 2013 (1)
- Dominic Dagradi (1)
- Electric Imp (1)
- Email Signature (1)
- Emily Gasca (1)
- Engineering Culture (1)
- Erica Kwan (1)
- Erin Parker (1)
- Farnaz Ronaghi (1)
- Female CTOs (1)
- Femgineer (1)
- Future of Food Hackathon (1)
- Gayle Laakmann McDowell (1)
- Gemma Barlow (1)
- Go Against The Flow (1)
- Google I/O 2014 (1)
- Grace Hopper (1)
- Grace Hopper Celebration (1)
- Hacbright Academy (1)
- Hack Your Life (1)
- Hack(bright) for Good (1)
- Hackbright Girl Geek Dinner (1)
- Hackbright engineering fellow (1)
- Hardware (1)
- Harvey Mudd College (1)
- Homejoy (1)
- Hour of Code (1)
- Huffington Post (1)
- Hypatia (1)
- Indiegogo (1)
- Industry Insight (1)
- Ingrid Avendaño (1)
- Jason Huggins (1)
- JavaScript (1)
- Jibe (1)
- Job Seeker (1)
- Jocelyn Goldfein (1)
- Joel Franusic (1)
- Kat Hagan (1)
- Kate Matsudaira (1)
- Katherine Wu (1)
- Katie Miller (1)
- Kaylee (1)
- Kelsey Yocum (1)
- Kimber Lockhart (1)
- Ksenia Burlachenko (1)
- LAUNCH Hackathon (1)
- Lauren Antonoff (1)
- Leap Motion (1)
- Lindsay Cade (1)
- LinkedIn (1)
- LinkedIn Profile (1)
- Lise Meitner (1)
- Lookout Mobile Security (1)
- Louise Fox (1)
- Maia Bittner (1)
- Margaret Le (1)
- Margaret Leibovic (1)
- Maria Klawe (1)
- Mariane Abou-Jaoudé (1)
- Marie Curie (1)
- Marissa Marquez (1)
- Martha Kelly (1)
- Math (1)
- Matt Haines (1)
- Mattermark (1)
- Meebo (1)
- Megan Anctil (1)
- Meggie Mahnken (1)
- Mercedes Coyle (1)
- Minted (1)
- Moms in Tech (1)
- Morgan Griggs (1)
- Mozilla (1)
- Music Information Retrieval (1)
- NCC Group (1)
- NESTA (1)
- Natasha Litt (1)
- Nidhi Kulkarni (1)
- Night Classes (1)
- Nishita Agarwal (1)
- Noah Kindler (1)
- Node (1)
- Node.js (1)
- Noise (1)
- O’Reilly (1)
- PandaWhale (1)
- Parse (1)
- Part-time classes (1)
- Perforce (1)
- PickAxe Mobile (1)
- Pinterest (1)
- Platform API (1)
- Popforms (1)
- Powers of Two (1)
- Presidential Innovation Fellow (1)
- Programming Interviews Exposed (1)
- PyCon 2014 (1)
- Quirky Eggs (1)
- Raji Arasu (1)
- Rana el Kaliouby (1)
- Raspberry Pi (1)
- Rebecca Parsons (1)
- RocksBox (1)
- Rosalind Franklin (1)
- Rosette Diaz (1)
- Ruby (1)
- Ruby on Rails (1)
- Sandra Lerner (1)
- Sandy Jen (1)
- Sarah Allen (1)
- Sarah Mei (1)
- Science and Technology (1)
- Security Engineer (1)
- Selenium (1)
- Shannon Burns (1)
- She Started It (1)
- Shiv Kumar (1)
- Smithsonian (1)
- Software Testing (1)
- Spitfire Athlete (1)
- Square (1)
- Stephanie Shupe (1)
- Steve Tjoa (1)
- StubHub (1)
- Student (1)
- Tech Gives Back (1)
- Testing (1)
- The Developers (1)
- ThoughtWorks (1)
- Tom Croucher (1)
- Toxicity (1)
- Tracy Chou (1)
- Twilio (1)
- TwilioQuest (1)
- Twitter (1)
- Velocity Conference 2014 (1)
- Vida Ha (1)
- Warren Colbert (1)
- Washington DC (1)
- Web Developer (1)
- Wendy Saccuzzo (1)
- Women TechMakers (1)
- Work Culture (1)
- Zainab Ghadiyali (1)
- Zed Shaw (1)
- ada (1)
- all-women (1)
- alumna spotlight (1)
- alumnae (1)
- alumni (1)
- app (1)
- be brave get paid (1)
- bias (1)
- black leaders (1)
- blog (1)
- bloomberg (1)
- career strategist (1)
- catalyst (1)
- checkr (1)
- codecademy (1)
- coderpad (1)
- coding interviews (1)
- collaborative coding (1)
- conference (1)
- corporate (1)
- costume (1)
- documentaries (1)
- edie windsor (1)
- education (1)
- efective communication (1)
- employer sponsorship (1)
- engineers (1)
- entrepreneur (1)
- fall2014 (1)
- fellowship (1)
- female founders (1)
- field trip (1)
- finance (1)
- fintech (1)
- firebase (1)
- front-end (1)
- genentech (1)
- getting started (1)
- girl power (1)
- git (1)
- giving back (1)
- gynopedia (1)
- hack bright (1)
- hacker (1)
- hacksmart2018 (1)
- implicit bias (1)
- infographic (1)
- information security (1)
- infosec (1)
- integration (1)
- internship (1)
- interviews (1)
- intro to programming (1)
- jobhunting (1)
- kids code (1)
- leadership (1)
- lesbians who tech (1)
- lwt (1)
- notifica (1)
- onboarding (1)
- online python 101 (1)
- pair programming (1)
- phenomenal woman (1)
- pre-bootcamp (1)
- programmers (1)
- python web framework (1)
- run the world (1)
- salary (1)
- salary negotiation (1)
- san francisco (1)
- security consulting (1)
- self-documenting code (1)
- sf (1)
- sheroes (1)
- software engineering fellowship (1)
- south bay (1)
- starting your own business (1)
- startup (1)
- streak (1)
- superheroes (1)
- tech tips (1)
- technical interview (1)
- technologies (1)
- transition (1)
- tuition (1)
- unconscious bias (1)
- volunteering (1)
- work-life balance (1)
- wwc (1)
- zach haehn (1)
- zapier (1)
- “Versal” (1)