machine learning in practice
metis 2015.10.20
data science problems are ambiguous
solve for x
x = 5 + 2
projectevolution
A x = b
optimize
f(x)
optimize
A x = b
subject to
f(x) > 0
optimize
“our profitability”
origins of ambiguity
many feasible approaches
origins of ambiguity
unclear problems
identify the best locations to plant new trees
how many?
what kinds of trees?
move old trees?
replace old trees?
aesthetically pleasing?
maximize growth?
increase foliage?
offset CO2 emissions?
scientific method
hypothesis
experimentmeasurement
agile programming
story/user cards
write codeQA; requirements churn
lean startup
business requirements
minimum viable productsplit testing, A/B testing
human-centered design
personas, use cases
build device prototypessurveys, interviews, focus groups
iterative problem solving
anticipate refining your solution
generate ideas
build prototypeevaluation
1-4 week
iterations
a few case studies
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
Lorem Ipsum: a narrative about blankets.
Author: Charlie Brown
Date: 31 Jan 2012
Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a
long history starting from the 1500s and is still used in digital millennium for typesetting
electronic documents, page designs, etc.
In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin
book that was written about 50 BC. Nevertheless, Lorem Ipsum’s words have been
changed so they don’t read as a proper text.
Naturally, page designs that are made for text documents must contain some text rather
than placeholder dots or something else. However, should they contain proper English
words and sentences almost every reader will deliberately try to interpret it eventually,
missing the design itself.
However, a placeholder text must have a natural distribution of letters and punctuation
or otherwise the markup will look strange and unnatural. That’s what Lorem Ipsum helps
to achieve.
I would like to thank Peppermint Pattyfor her support on studying
Lorem Ipsum as well as the infinite wisdom of Linus van Peltand his
willingness to use his blanket in my experiments.
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
data-driven expertise exploration
procter & gamble
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent turn over to plaintiff
don’t
turn over to plaintiff
adverse inference
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent turn over to plaintiff
don’t
turn over to plaintiff
adverse inference
give away trade secrets
data-driven e-discovery
daegis
aboutpatent
not
aboutpatent turn over to plaintiff
don’t
turn over to plaintiff
adverse inference
give away trade secrets
data-driven e-discovery
daegis
turn over to plaintiff
don’t
turn over to plaintiff
data-driven e-discovery
daegis
data-driven e-discovery
daegis
create a “document map”
algorithm design
patents
marketing
finances
fantasy football
lunch
coffee
data-driven e-discovery
daegis
create a “document map”
fantasy football
algorithm design
patents
lunch
marketing
finances
coffee
review away shades of grey
reduce reviews by 90-99%
data-driven e-discovery
daegis
data-driven e-discovery
daegis
data-driven e-discovery
daegis
data-driven privacy
scrubadub
Hey Bo,
Our scrubadub project is available at 

http://github.com/deanmalmgren/scrubadub and if anyone wants to help,
they should totally swing by our office at Adams and Wells in Chicago
(soon to be 17 N State) and pitch in. Of course, you can always just reach
out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about
anything else for that matter.
Hokey dokey. See you in a bit.

Dean
Hey Bo,
Our scrubadub project is available at 

http://github.com/deanmalmgren/scrubadub and if anyone wants to help,
they should totally swing by our office at Adams and Wells in Chicago
(soon to be 17 N State) and pitch in. Of course, you can always just reach
out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about
anything else for that matter.
Hokey dokey. See you in a bit.

Dean
data-driven privacy
scrubadub
Hey Bo,
Our scrubadub project is available at 

http://github.com/deanmalmgren/scrubadub and if anyone wants to help,
they should totally swing by our office at Adams and Wells in Chicago
(soon to be 17 N State) and pitch in. Of course, you can always just reach
out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about
anything else for that matter.
Hokey dokey. See you in a bit.

Dean
data-driven privacy
scrubadub
Hey Bo,
Our scrubadub project is available at 

http://github.com/deanmalmgren/scrubadub and if anyone wants to help,
they should totally swing by our office at Adams and Wells in Chicago
(soon to be 17 N State) and pitch in. Of course, you can always just reach
out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about
anything else for that matter.
Hokey dokey. See you in a bit.

Dean
data-driven privacy
scrubadub
Hey Bo,
Our scrubadub project is available at 

http://github.com/deanmalmgren/scrubadub and if anyone wants to help,
they should totally swing by our office at Adams and Wells in Chicago
(soon to be 17 N State) and pitch in. Of course, you can always just reach
out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about
anything else for that matter.
Hokey dokey. See you in a bit.

Dean
data-driven privacy
scrubadub
data-driven privacy
scrubadub
Hey {{NAME}},
Our {{NAME}} project is available at 

{{URL}} and if anyone wants to help,

they should totally swing by our office at {{ADDRESS}}

(soon to be {{ADDRESS}}) and pitch in. Of course, you can always just
reach out on Twitter ({{TWITTER}}), skype ({{SKYPE}}) or just about

anything else for that matter.
Hokey dokey. See you in a bit.

{{NAME}}
data-driven privacy
scrubadub
placeholders
Hey {{NAME-1}},
Our {{NAME-2}} project is available at 

{{URL-1}} and if anyone wants to help,

they should totally swing by our office at {{ADDRESS-1}}

(soon to be {{ADDRESS-2}}) and pitch in. Of course, you can always just
reach out on Twitter ({{TWITTER-1}}), skype ({{SKYPE-1}}) or just about

anything else for that matter.
Hokey dokey. See you in a bit.

{{NAME-3}}
data-driven privacy
scrubadub
unique identifiers
Hey Josie,
Our bleckenschpiel project is available at 

http://example.com/bleck and if anyone wants to help,

they should totally swing by our office at 42nd & Broadway in New York

(soon to be 30th & 8th) and pitch in. Of course, you can always just reach

out on Twitter (@b3ujqw9), skype (xylophone1) or just about

anything else for that matter.
Hokey dokey. See you in a bit.

Billy
data-driven privacy
scrubadub
surrogate data
government acronym
part number
bus route
•
•
•
name
phone number
email address
•
•
•
data-driven privacy
scrubadub
general

purpose
custom
machine learning in practice
novelty by design

20151020 Metis

  • 1.
    machine learning inpractice metis 2015.10.20
  • 2.
    data science problemsare ambiguous solve for x x = 5 + 2 projectevolution A x = b optimize f(x) optimize A x = b subject to f(x) > 0 optimize “our profitability”
  • 3.
    origins of ambiguity manyfeasible approaches
  • 4.
    origins of ambiguity unclearproblems identify the best locations to plant new trees how many? what kinds of trees? move old trees? replace old trees? aesthetically pleasing? maximize growth? increase foliage? offset CO2 emissions?
  • 5.
    scientific method hypothesis experimentmeasurement agile programming story/usercards write codeQA; requirements churn lean startup business requirements minimum viable productsplit testing, A/B testing human-centered design personas, use cases build device prototypessurveys, interviews, focus groups iterative problem solving anticipate refining your solution generate ideas build prototypeevaluation 1-4 week iterations
  • 7.
    a few casestudies
  • 8.
  • 9.
  • 10.
  • 11.
    Lorem Ipsum: anarrative about blankets. Author: Charlie Brown Date: 31 Jan 2012 Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a long history starting from the 1500s and is still used in digital millennium for typesetting electronic documents, page designs, etc. In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin book that was written about 50 BC. Nevertheless, Lorem Ipsum’s words have been changed so they don’t read as a proper text. Naturally, page designs that are made for text documents must contain some text rather than placeholder dots or something else. However, should they contain proper English words and sentences almost every reader will deliberately try to interpret it eventually, missing the design itself. However, a placeholder text must have a natural distribution of letters and punctuation or otherwise the markup will look strange and unnatural. That’s what Lorem Ipsum helps to achieve. I would like to thank Peppermint Pattyfor her support on studying Lorem Ipsum as well as the infinite wisdom of Linus van Peltand his willingness to use his blanket in my experiments. data-driven expertise exploration procter & gamble
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    aboutpatent not aboutpatent turn overto plaintiff don’t turn over to plaintiff adverse inference data-driven e-discovery daegis
  • 19.
    aboutpatent not aboutpatent turn overto plaintiff don’t turn over to plaintiff adverse inference give away trade secrets data-driven e-discovery daegis
  • 20.
    aboutpatent not aboutpatent turn overto plaintiff don’t turn over to plaintiff adverse inference give away trade secrets data-driven e-discovery daegis
  • 21.
    turn over toplaintiff don’t turn over to plaintiff data-driven e-discovery daegis
  • 22.
  • 23.
    create a “documentmap” algorithm design patents marketing finances fantasy football lunch coffee data-driven e-discovery daegis
  • 24.
    create a “documentmap” fantasy football algorithm design patents lunch marketing finances coffee review away shades of grey reduce reviews by 90-99% data-driven e-discovery daegis
  • 25.
  • 26.
  • 27.
    data-driven privacy scrubadub Hey Bo, Ourscrubadub project is available at 
 http://github.com/deanmalmgren/scrubadub and if anyone wants to help, they should totally swing by our office at Adams and Wells in Chicago (soon to be 17 N State) and pitch in. Of course, you can always just reach out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about anything else for that matter. Hokey dokey. See you in a bit.
 Dean
  • 28.
    Hey Bo, Our scrubadubproject is available at 
 http://github.com/deanmalmgren/scrubadub and if anyone wants to help, they should totally swing by our office at Adams and Wells in Chicago (soon to be 17 N State) and pitch in. Of course, you can always just reach out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about anything else for that matter. Hokey dokey. See you in a bit.
 Dean data-driven privacy scrubadub
  • 29.
    Hey Bo, Our scrubadubproject is available at 
 http://github.com/deanmalmgren/scrubadub and if anyone wants to help, they should totally swing by our office at Adams and Wells in Chicago (soon to be 17 N State) and pitch in. Of course, you can always just reach out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about anything else for that matter. Hokey dokey. See you in a bit.
 Dean data-driven privacy scrubadub
  • 30.
    Hey Bo, Our scrubadubproject is available at 
 http://github.com/deanmalmgren/scrubadub and if anyone wants to help, they should totally swing by our office at Adams and Wells in Chicago (soon to be 17 N State) and pitch in. Of course, you can always just reach out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about anything else for that matter. Hokey dokey. See you in a bit.
 Dean data-driven privacy scrubadub
  • 31.
    Hey Bo, Our scrubadubproject is available at 
 http://github.com/deanmalmgren/scrubadub and if anyone wants to help, they should totally swing by our office at Adams and Wells in Chicago (soon to be 17 N State) and pitch in. Of course, you can always just reach out on Twitter (@deanmalmgren), skype (dean.malmgren) or just about anything else for that matter. Hokey dokey. See you in a bit.
 Dean data-driven privacy scrubadub
  • 32.
  • 33.
    Hey {{NAME}}, Our {{NAME}}project is available at 
 {{URL}} and if anyone wants to help,
 they should totally swing by our office at {{ADDRESS}}
 (soon to be {{ADDRESS}}) and pitch in. Of course, you can always just reach out on Twitter ({{TWITTER}}), skype ({{SKYPE}}) or just about
 anything else for that matter. Hokey dokey. See you in a bit.
 {{NAME}} data-driven privacy scrubadub placeholders
  • 34.
    Hey {{NAME-1}}, Our {{NAME-2}}project is available at 
 {{URL-1}} and if anyone wants to help,
 they should totally swing by our office at {{ADDRESS-1}}
 (soon to be {{ADDRESS-2}}) and pitch in. Of course, you can always just reach out on Twitter ({{TWITTER-1}}), skype ({{SKYPE-1}}) or just about
 anything else for that matter. Hokey dokey. See you in a bit.
 {{NAME-3}} data-driven privacy scrubadub unique identifiers
  • 35.
    Hey Josie, Our bleckenschpielproject is available at 
 http://example.com/bleck and if anyone wants to help,
 they should totally swing by our office at 42nd & Broadway in New York
 (soon to be 30th & 8th) and pitch in. Of course, you can always just reach
 out on Twitter (@b3ujqw9), skype (xylophone1) or just about
 anything else for that matter. Hokey dokey. See you in a bit.
 Billy data-driven privacy scrubadub surrogate data
  • 36.
    government acronym part number busroute •
•
• name phone number email address •
•
• data-driven privacy scrubadub general
 purpose custom
  • 37.
    machine learning inpractice novelty by design