Machine Learning Research

Researching Machine Learning

research

How do

Basically, all of the test cases need to be the same length (number of pixels, characters, etc) and have a normalized output (again same length, and stable index across possible inputs)

Example ways of normalizing test cases

  • Random crop of a fixed size out of images
  • Images placed randomly on a random noise buffer of the same size
  • A byte array of ~1000 random characters, and the log line placed at a random offset
  • Array of random assembly opcodes, with a disassembly of the function placed at a random offset

Example ways to remove bias

  • Shift the hue of the image
  • Rotate the image
  • Random placement on field of noise

Tags?

To classify tags, just have each tag as a bit in the output array:

{pixel 1x1, pixel 1x2, ...} { tag_has_house?, tag_has_dog?, tag_has_cat?}

That way, the output from a test will be the likely hood of each tag.

Upsides:

  • One test run for any number of tags
  • Only one model for any number of tags

Downsize:

  • Model needs to be retrained when a new tag is applied to the training set

Need to research