Saturday, July 2, 2016

Machine Learning / Computer Vision

2 July, 2016: So I've spent a bit over 1K hours reading about and watching video lectures on neural nets. I was able to code the core algorithm for feed forward / back propagation in javascript. Here's the code I wrote: https://github.com/DiginessForever/randomCode/tree/master/machineLearning

I still need to gather more data to train it - haven't been able to use it yet because I don't have it hooked up with a dataset yet. Some possibilities would be making it learn to walk a robot (staying upright, being given a desired direction and status of whether it is standing/walking or has fallen down), or I could do the "hello world" project and have it recognize characters (letters and numbers).

Other updates: Elon Musk and co has come out with OpenAI Gym.

I've been following along with Adrian Rosenbrock (spelling?) PyImageSearch blog site (bought his book and virtual image of Ubuntu as well).

I also found this recently:
http://www.computervisionblog.com/2015/01/from-feature-descriptors-to-deep.html


To watch later:
https://www.youtube.com/watch?v=3lp9eN5JE2A "Evolving AI Lab: Deep Learning Overview & Visualizing What Deep Neural Networks Learn"
https://www.youtube.com/watch?v=szHRv4MwCBY "Microsoft Resarch: Recent Advances in Deep Learning at Microsoft: A Selected Overview"
Soft Robotics, a youtube video channel: https://www.youtube.com/playlist?list=PL9Bl8hjGqGOUaRjANlTqGmPWtYmM99lfM
---------------------
Post from May 6th, 2016:
Next step for my neural net project - learn to use the Raspberry Pi deep belief SDK (gives access to the on board GPU's 20 GFLOPs - a lot faster than it would be if you only used the CPU, which also has to run the operating system). Yay - I don't have to hack a solution using assembler on the GPU anymore (that was the worst headache - I spent quite a good bit of time reading the open sourced documentation for the GPU, and it wasn't going to be pretty, then would only be good for one device). I am just about done with the primary code for my javascript neural net. I definitely recommend coding a solution out if you do not understand a subject completely - it forces you to wrap your mind around it, and if you think you understand it but do not, you'll find out very quickly when it doesn't work.

Last weekend, Elon Musk and a bunch of other researchers and companies interested in AI released a tool, OpenAI Gym. It's what I thought I was going to have to code myself - environments that give visual feedback / let your neural net control a model/actor and solve a problem. The cool thing about this tool is that (at least according to the documentation) it is framework agnostic, meaning you can use multiple deep learning libraries to train the neural nets, then give the trained net to the tool, where it works to solve the problem.

One cool thing I've found out - the feedforward/backprop neural net is the foundation for all the deep learning research that's been going on. I am still at a pretty severe disadvantage when compared to real researchers though - they have top of the line GPUs so can run much larger networks much faster.

However, having programmed a neural net, I can at least now understand a lot of what they're talking about. For instance, I found a new set of researchers to read up on - there's a Swiss AI lab that's won 5 past international machine learning competitions. There are two things I want to learn about that they are doing.
1. Long term / Short term Memory with recurring neural nets. They link the layers a bit differently to make a sort of logic gate which remembers or forgets based on relevance.
2. Hierarchical neural nets. I didn't see any links to this as I scanned through the research links on their page, but I did see references to it in articles. Somehow, they stack trained neural nets to have a more comprehensive net that understands more.

Here Yann Lecun's MNIST data set: http://yann.lecun.com/exdb/mnist/

And here's another C implementation - this one also uses for loops instead of matrices: http://www.cs.bham.ac.uk/~jxb/NN/nn.html I'm working on the last piece of the backprop right now. The final questions I have are on step size and learning rate. Apparently, it's best if you adjust them smaller as the net starts to settle on a solution.



----------
Post from 1 May, 2016:
https://gym.openai.com/docs

Interesting. Facebook lets me post some links, but not others. For instance, Friday I gave up trying to post about that morning's announcement by Movidius about the new USB stick deep learning accelerator they had developed. The FB form wouldn't accept the link as far as copy/pasting it.

In any case, OpenAI is Elon Musk's company, supposedly open sourcing AI. They have indeed made tools available, but they are also still open to patenting anything they get their hands on. So beware giving them too much or getting too invested in that toolset.

-----------
26 April, 2016:
Machine learning on the RPi: https://scientistnobee.wordpress.com/2014/06/20/machine-learning-with-raspberry-pi/
Found a link which explains a few questions I had on backprop. https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Combine that with this for a straight forward / simple C implementation in code of what the guy is talking about in that blog post (it'll answer any remaining questions): http://www.cs.bham.ac.uk/~jxb/NN/nn.html
-----------
23 April, 2016:
This is a blog post from Adobe back in 2011. I saw a link to it on the comments for a CSI related article. Basically, if you take a lot of pictures of a static scene, you can get more resolution. If anything is moving, it won't work for that thing. I'll have to dig further to see what algorithms are used.
https://blogs.adobe.com/photoshop/2011/10/behind-all-the-buzz-deblur-sneak-peek.html
----------
20 April, 2016:
This is kind of neat. Leaves out integrals, which is a whole different beast:
https://www.quora.com/What-is-the-correct-algorithm-to-perform-differentiation-using-a-computer-program-for-any-function-entered-by-the-user/answer/Prashant-Sharma-12?srid=uOUuv&share=5c81d9a3
---------
16 April, 2016:
Wow, I've never been so close to completing this neural net that's been stuck in my caw for so long. https://github.com/DiginessForever/randomCode

Still working on backprop, but luckily, there's this really smart high schooler on Youtube who actually uses legible notation and has coded a passable implementation (most professors don't seem to code, so I end up seeing a lot of really hard to understand C code wrote by random people / doesn't go with whatever lecture it is I'm looking at): https://github.com/SullyCh…/FFANN/blob/master/cpps/FFANN.cpp

That, combined with a ton of other Youtube tutorials, makes it look like I'll push my brain over the ledge.

https://www.youtube.com/watch?v=gl3lfL-g5mA
--------------
16 April 2016:
Just got back from the physics department at WashU in Saint Louis. They have talks every Saturday in Oct, April, and sometimes into May. Today's talk was about Vision arising from Neurons. I am pretty sure Rupert Murdock was sitting on the other side.
-- The professor didn't talk about computer science neural net models, only the experiments and models they've been doing since 2013 when the BRAIN Iniative started. Apparently, they cut brains and eyes out of turtles and have it watch movies while they record the signals running down the nerves from the eye to the brain.

-- I did learn one thing new. There are weights in both the neurons themselves and in the axons - the area between the connections of one neuron to another. Also, again about compression from the retina to the signal going down the nerves from eye to the brain. It's apparently only about a million bits of information in pulses moving down that nerve, while there is much more info getting taken in by the three cone types (wavelengths) and rods (low light amplitude).

-- So definitely doesn't help me with backprop, and it feels like that community is around 60-70 years behind on models. I'm sure they'll catch up fast.
------------
12 April, 2016:
I checked out 123D Catch the other day. It's pretty smooth. You take about 15 pictures around an object and it turns them into a single 3d model of the object. It uses the phone gyro and compass to give you a map of which areas you've already taken and which you still need. I did one of my boots and got a very high quality fully textured model with no extraneous floating points. It's kind of like VisualSFM, only closed source black box solution. The models turn out better and the app is free, but if you wanted to use the models for a product, it's $10/month.

I really would like to have a completely non-cloud-based software solution that's pipelined so that I can move my phone around an object, have it grab a bunch of pictures, then crunch those later to automatically create a 3d model (perhaps when cell phones are even faster, it will be possible to do the data crunching/picture comparisons on-the-fly).
------------
3 April 2016:
https://www.youtube.com/watch?v=xsEdu6Xq6KU

Been doing more reading. Getting a little further, slowly. This answer on Quora gives a good entry point to each part of the process:
https://www.quora.com/What-is-feature-detector-descriptor-descriptor-extractor-and-matcher-in-computer-vision?share=1

Found this site as well: http://www.vlfeat.org/api/index.html


Another resource - OpenCV page: http://docs.opencv.org/…/feature_d…/feature_description.html

This subject is actually rather huge - I imagine that one day we'll have inverse graphics cards - since computer vision is kind of like doing everything a graphics card currently does, but backwards (with a lot more processing involved).

-- This guy has some great videos - this one explains feature descriptors - I'm not done watching it yet: https://www.youtube.com/watch?v=oFVexhcltzE . I got sidetracked onto Kahn Academy to learn about Laplace transforms. Apparently, the math involved in making a feature descriptor scale invariant (can detect if it's the same object regardless of how big or small / near or far away it is) depends on Laplace.

-- The Harris Corner detector is rotation invariant, and then Laplace is somehow used to make it scale invariant (found this too: https://www.youtube.com/watch?v=NPcMS49V5hg ). The two are combined to fully match two features between images from different cameras of the same scene. Once features are matched, you can do trig to find the distance (z-coord), giving you a point cloud.

-- Apparently, if you use SIFT, you have to pay royalties (University of British Columbia owns it). The professor did an awesome job. Now, however, there is another algorithm which outperforms it and has an open license: Kaze. I'm going to have to compare the two algorithms to see if Kaze does everything I need.

---------
29 March, 2016:
Saw something interesting recently. Apparently, my difficulty in progressing from edge detectors to marking more general features in images had to do with terminology in the field.
A feature detector is an algorithm that makes things stand out more, like edges or corners.
A feature extractor goes more in depth.
---------
21 March, 2016:
Thoughts on using shaders, OpenCL, or CUDA: if you consider that the typical 2.5GHz single core processor has 10 GFLOPS (floating point operations per second), in relation to a graphics card, the graphics card usually has more. For instance, the Raspberry Pi has a low powered / low-end graphics card in it. That GPU (graphics card) has 25 GFLOPS. If you can use both the processor and the GPU at the same time, then you can obviously add the two together. That would mean even really low end computer like the RPi could be blazingly fast.
---------
(political, non-technical): 5 Mar, 2016:
The comments on this ArsTechnica article are great: http://arstechnica.com/information-technology/2016/03/dod-officials-say-autonomous-killing-machines-deserve-a-look/?comments=1&start=40

There's a comment with a link to a story about a war between two chimpanzee communities studied by Jane Goodall (basically saying that violence is part of our nature): https://en.wikipedia.org/wiki/Gombe_Chimpanzee_War

One thing that most strikes me is this idea of a "moral compass" in software. I think what people don't realize is that in order to do that, because computers are just math devices, any technique of giving a program/algorithm morals will actually just be assigning a set value to human life using some equation.

The reasons why I have not volunteered to help these DoD officials are:
1. The open letter regarding autonomous killing machines.
2. The issues of trust regarding all the lies around 9/11 (foundation for "The War on Terror") and the kind of decisions they've been making with their current program (the absolutely abysmal rate of innocents/not-innocents killed + deliberate killing of underaged US citizen).
3. Their complete lack of any desire to use said technology for good (I was denied on submittal of idea concerning research project for using machine learning for USDA/DoD agricultural partnership).

-------- 25 Feb, 2016:
Saw a post on FB on the Backyard Brains page about a book:
https://www.amazon.com/Neuroscience-Dummies-Frank-Amthor/dp/1118086864/181-6136217-7332508?ie=UTF8&SubscriptionId=AKIAILSHYYTFIVPWUY6Q&camp=2025&creative=165953&creativeASIN=1118086864

Some other material I picked up off the Backyard Brain's FB page: 1. http://www.brainfacts.org/about-neuroscience/brain-facts-book/
2. https://backyardbrains.com/experiments/EOG
3. From Kevin Crosby's comment: "Here's a history of brain implant technology I compiled. An early draft was credited in 2005 as the basis for the Wikipedia article on the subject, and back in 1997 the Department of Defense ordered me to stand down when I tried to discuss it". http://skewsme.com/tinfoilhat/chapter/brain-implants/


---------

17 Feb, 2016:

I don't remember how I came acros this, but combined with software defined radio and a little hacking, I might be able to do something like this myself...I still have that parallella with the onboard FPGA and 2 ARM CPUs. The biggest challenge is dealing with the extremely high frequency (2.4Ghz is 2.4 billion cycles per second). I really liked what he said about having arrays of transmitters (phased arrays) to very quickly aim the radar in an arbitrary direction. I'd much rather have that than an antennae cone.
https://www.youtube.com/watch?v=ztR9mdJ1YWU

One of the reasons that Google has been so succesful with their self driving cars is that they don't only rely on computer vision, they also have onboard LIDAR. It'd be really neat to get a point cloud from a very cheap set of hardware and overlay images from a cheap webcam on top of that for further object classification.

---------

Using machine learning to evolve muscles / bone (making the foundations for a robotic walker):
https://www.youtube.com/watch?v=z9ptOeByLA4&feature=youtu.be
keywords: soft robotics, morphology, paper: "Flexible Muscle-Based Locomation for Bipedal Creatures"

--------

1 Feb, 2016:

I was doing a search on Google and Bing for "sort point cloud rotation invariant", and I found this site: http://www.openperception.org/ - the point cloud API is here: http://www.pointclouds.org/ Still learning about computer vision, but slowly converging on an overall understanding of the process. The more I understand, the more APIs and communities I find, which is cool.

--------

29 Jan, 2016: Microsoft just open sourced their neural net toolkit. Seems better documented / more familiar than Facebook's or Google's. https://github.com/Microsoft/CNTK/wiki/Examples

--------

12 Jan, 2016: This is a pretty good post explaining the need for robotics. In the past, I've thought that robots, on the whole, will take away needed jobs, but in fact, there are too many jobs which cannot pay enough, but which we need filled:
https://medium.com/@gerkey/looking-forward-to-the-robot-economy-1ba4ee1647e3#.jt5mfngv0


--------

29 Dec, 2015: http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html

--------

24 November 2015:
Best video I've ever seen on neural net construction (Dave Miller):
https://www.youtube.com/watch?v=KkwX7FkLfug

-------

7 Oct, 2015:
All,

I need some help. I'm submitting an idea for the upcoming Airmen Powered by Innovation summit in December. However, I don't know anyone in the DoD or directly working for government who is a computer vision researcher. I know of people working for DARPA who do this, and there are a lot working for universities. I can put together a team for the presentation, provided they work for DoD / government.

Here's my Innovation Summit submission.

I don't think I want to do it alone - while I have a lot of strengths, this area definitely stretches quite a bit past my current skill level. I would say that it would represent possibly max potential.



-------

16 Sep, 2015: I just bought a copy of Practical Python and OpenCV! https://www.pyimagesearch.com/practical-python-opencv/ @pyimagesearch

It was $70, but worth it, as in the quickstart bundle, Adrian Rosebrock created a Ubuntu image for VirtualBox virtual machine. That makes it unnecessary to spend all the time configuring an extra OS, installing Python 3 (new version) and OpenCV 3. It also has a 270 page book with code + case studies and a bunch of videos. I'm sure the image will save me more time in the future, as I always wreck my OSs.

I've installed OpenCV in the past, and it's a pain getting everything set up initially. However, python and OpenCV are probably the best way to study computer vision. I think having an image of a pre-configured OS all set up and ready to go is an excellent idea.

3 Sep, 2015:
www.dtic.mil/cgi-bin/GetTRDoc?Location=U2&doc=GetTRDoc.pdf&AD=ADA092604
This is an early project in computer vision, the report was published in 1980. The project was done at Stanford. They programmed it in BASIC. Pages 21 and 22 demonstrate some serious math fu. This project has everything I'm interested in - structure from motion, estimating a 3D world model, and pathing.

-------

13 Aug, 2015:
Not directly computer vision related, but video codecs are related to how much compression you get and how much overhead it takes to transfer the video files. https://yro.slashdot.org/story/15/08/11/2327221/cisco-developing-royalty-free-video-codec-thor

-------

22 Jul, 2015:
My filter bubble has apparently started to include a lot more to do with computer vision. I've come across references to Halide now three times in various places, related videos on Youtube, and now a mention from a recent presentation by the Khronos Group about OpenCL. Halide apparently is a computer language specifically for computer vision. I might have to check it out.

On another note, apparently the Parallella's Epiphany FPGA processor will soon have an OpenCL API release.
An API (Application Program Interface) is basically just a group of functions/methods you can run from a program you make. You include/import at the top of the program, and then can run them at any point in your code. OpenCL is a standard that allows you to do computation on both CPUs and GPUs (graphics card processors).


-------

20 Jul, 2015: A sub/sub bot combo found a Napoleonic era shipwreck by accident (weren't specifically looking for that wreck): http://www.theregister.co.uk/2015/07/18/shipwreck_discovery_sonar_auv_north_carolina/

-------

19 July, 2015: Got OpenCV 3.0.0 installed on Slax finally. This is a good set of instructions, the only difference for Slax was that there's no ld.so.conf.d directory, only a ld.so.conf file, so that one line "/usr/local/lib" goes at the end of that file instead:

http://webcache.googleusercontent.com/search?q=cache:2vOqPlUYfNoJ:www.samontab.com/web/2014/06/installing-opencv-2-4-9-in-ubuntu-14-04-lts/+&cd=1&hl=en&ct=clnk&gl=us

That version of OpenCV just came out last month. Apparently they have a lot of the processes automatically using both the GPU and CPU, speeding them up. Also, for accessing video streaming/video files, you don't need the separate dependencies.

Comments: So, I've gotten basic wavelets down. New goals - learn about homogenous coordinates and basis functions in matrices. Learn OpenGL, become a lot more familiar with SIFT and SURF computer vision algorithms. -- https://www.youtube.com/watch?v=oFVexhcltzE

-------

19 Jul, 2015:
This is awesome - automatic machine vision correction of video when you're on a videochat - makes it always look like you're looking the person on the other side in the eye.
https://www.youtube.com/watch?v=A5QlDfBpNxw

-------

19 Jul, 2015:
https://www.youtube.com/watch?v=Y9K2yeBZS9I
Found this video on Youtube - has exactly how to do a 2D Haar wavelet transform step by step.

-------

17 Jul, 2015:
Some more pictures and the transformed version.

Now to do some more color conversions, then apply a DFT.


-------

17 Jul, 2015:
Coming along - been playing with code that takes in images and then graphs them in 3 dimensions. Finally have some that works decently after making every mistake in the book.
Comment: (me) That picture is a girl with a world map painted on her face in many different colors. I've copied a piece of code (stack exchange) that maps the colors from RGB onto a jet color space. I still haven't gotten a good install for OpenCV on Slax yet, so I'm using just numpy and matplotlib.

-------

14 Jul, 2015: Software that does this is so neat. My favorite so far is Visual SFM. You take a bunch of pictures around an object / scene and put them in a folder (they need to be in JPG format), run the software, and it takes awhile to construct a 3D point cloud. You then use something like Meshlab and a modelling program like Blender to get a 3D object. https://stackoverflow.com/questions/7705377/how-to-create-3d-model-from-2d-image

-------

19 Mar, 2015:
Cool - Adapteva has their 16 core computer on Amazon now for pretty cheap. http://www.amazon.com/Adapteva-Parallella-16-…/…/ref=sr_1_1… This was a Kickstarter project a short while back (months? can't remember exactly when). It's about $150 and pretty dang small. It looks about RPI size, though I can't tell exactly.

No comments:

Post a Comment