<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI in Excel &#8211; NN in XL</title>
	<atom:link href="https://www.richardmaddison.com/tag/ai-in-excel/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.richardmaddison.com</link>
	<description>Richard Maddison</description>
	<lastBuildDate>Mon, 08 Apr 2019 09:20:38 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.5.7</generator>
	<item>
		<title>Building a Capsule Net in Excel</title>
		<link>https://www.richardmaddison.com/2019/01/13/building-a-capsule-net-in-excel/</link>
					<comments>https://www.richardmaddison.com/2019/01/13/building-a-capsule-net-in-excel/#comments</comments>
		
		<dc:creator><![CDATA[Richard Maddison]]></dc:creator>
		<pubDate>Sun, 13 Jan 2019 18:02:31 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI in Excel]]></category>
		<category><![CDATA[CapsNets]]></category>
		<category><![CDATA[Capsule Nets]]></category>
		<category><![CDATA[Capsule Networks]]></category>
		<category><![CDATA[Capsules]]></category>
		<category><![CDATA[Neural Networks]]></category>
		<category><![CDATA[Neural Networks in Excel]]></category>
		<guid isPermaLink="false">https://www.richardmaddison.com/?p=21515</guid>

					<description><![CDATA[<p>Capsule networks are possibly the biggest advance in neural network design in the last decade. They appear to mimic the human brain far more than convolutional neural networks and move us significantly closer to artificial general intelligence. As a step towards demystifying these new algorithms...</p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2019/01/13/building-a-capsule-net-in-excel/">Building a Capsule Net in Excel</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Capsule networks are possibly the biggest advance in neural network design in the last decade. They appear to mimic the human brain far more than convolutional neural networks and move us significantly closer to artificial general intelligence. As a step towards demystifying these new algorithms I’ve built one on-sheet in Excel.</p>



<ul class="wp-block-gallery columns-3 is-cropped"><li class="blocks-gallery-item"><figure><img loading="lazy" width="396" height="514" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/20190113_225324.gif" alt="" data-id="21540" data-link="https://www.richardmaddison.com/?attachment_id=21540" class="wp-image-21540"/></figure></li><li class="blocks-gallery-item"><figure><img loading="lazy" width="396" height="514" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/20190113_225517.gif" alt="" data-id="21541" data-link="https://www.richardmaddison.com/?attachment_id=21541" class="wp-image-21541"/></figure></li><li class="blocks-gallery-item"><figure><img loading="lazy" width="396" height="514" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/20190113_225759.gif" alt="" data-id="21542" data-link="https://www.richardmaddison.com/?attachment_id=21542" class="wp-image-21542"/></figure></li><li class="blocks-gallery-item"><figure><img loading="lazy" width="396" height="514" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/20190113_230143.gif" alt="" data-id="21543" data-link="https://www.richardmaddison.com/?attachment_id=21543" class="wp-image-21543"/></figure></li><li class="blocks-gallery-item"><figure><img loading="lazy" width="396" height="514" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/20190113_230334-1.gif" alt="" data-id="21544" data-link="https://www.richardmaddison.com/?attachment_id=21544" class="wp-image-21544"/></figure></li><li class="blocks-gallery-item"><figure><img loading="lazy" width="396" height="514" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/20190113_230839.gif" alt="" data-id="21545" data-link="https://www.richardmaddison.com/?attachment_id=21545" class="wp-image-21545"/></figure></li></ul>



<p style="text-align:center"><em>Fig: These GIFs stride back and forth over all 8 dimensions of the linear manifold of various digits while holding the other dimensions constant. They are from a Capsule Net built on-sheet in Excel which learns the linear manifold of the 10 MNIST handwritten digits and then uses these for categorisation of new Digits.</em></p>



<p>Since a turning point in 2012, neural networks, have become dominant in the field of machine learning and Artificial intelligence (AI). They are so named because they loosely model the structure of neurons in the brain. Nowadays, they pretty much form the default approach for computer vision, translation, speech recognition etc. Convolutional Neural Networks (ConvNets), are a sub-class of <g class="gr_ gr_9 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" id="9" data-gr-id="9">neural</g> networks and our most powerful tool for image analysis but only now, after 20 years of incremental improvements and optimization. Capsule Networks (CapsNets) are new, different and significant. In their first incarnation in 2017, they hit or beat state of the art performance benchmarks in several areas. I, for one, think they represent the next step in humanity’s march toward Artificial General Intelligence (AGI). </p>



<p>CapsNets appear to behave more like the brain
than ConvNets. Their inventor Geoff Hinton talks about these characteristics in
this presentation <a href="https://youtu.be/rTawFwUvnLE">https://youtu.be/rTawFwUvnLE</a> , given shortly after he released the paper in late 2017.&nbsp; Mimicking the human brain is a promising
route to our understanding and developing a theory of intelligence. An analogy
is the development of a theory of aerodynamics by initially studying birds. The
current fruits of this initial approach are aircraft that can circle the earth
in 6 hours or carry 500 passengers across the Atlantic. With a similar
trajectory, we can only wonder what a theory of intelligence will yield.</p>



<p>Over the last two years I’ve been trying to
master and implement Neural Nets in my work. To help me get up to speed, I’ve
been building in Microsoft Excel. This is slow but gives me a different and
intuitive way to see how they work, and given that some of these neural
networks were considered almost magical and certainly state-of-the-art quite
recently, to build in Excel is quite demystifying. I’ve built several
relatively large neural Networks and posted quite a few online. This batch run ConvNet
with Adam optimization hits recent benchmarks for recognizing human
handwriting&nbsp; <a href="https://youtu.be/OP7wi2MoSeM">https://youtu.be/OP7wi2MoSeM</a> and should give you a flavour.</p>



<p>CapsNets are exciting and look to me like a massive development in AI that brings us closer to understanding how the human brain works or at least some of the key maths that play a part in human learning and intelligence. I say this because they learn, fail and succeed in far more human ways than do ConvNets.</p>



<p></p>



<ul><li>Visual twists, shifts, squeezes and expansions of objects (affine transforms) put a CapsNet off the scent far less than a ConvNet. Think of those verification “captchas” that websites present you with to prove you’re human. We are far better at recognizing distorted digit captures than ConvNets. For a ConvNet to do the same it would have had to be trained on some similar distortion before whereas a CapsNet can extrapolate along those transformations more easily. This also makes them far better at recognizing 3D objects from different viewpoints than ConvNets.</li><li>Humans learn patterns that represent canonical objects with very few examples or rather instructions.&nbsp; You don’t need to subject a child to 60,000 examples of handwritten digits before they know the numbers 0 to 9. CapsNets have been trained with as few as 25 interventions. We still need to show them a load of handwritten characters but not actually tell them what these are. The advantages they offer for unsupervised learning are already phenomenal.</li><li>CapsNets effectively take bets on what they are seeing and seek information to confirm this. This is what the core routing by agreement algorithm does. As proof builds up they instantaneously prune densely connected layers to sparsely connected layers linking lower level features to specific higher-level features. This is analogous to our making an assumption on what we see, imposing a reference frame and seeking information to fill in the rest. This only becomes apparent when we get it wrong as we generally do with trick images like this shadow face: <a href="https://www.youtube.com/watch?v=sKa0eaKsdA0">https://www.youtube.com/watch?v=sKa0eaKsdA0</a></li><li>CapsNets suffer from the human visual problem known as crowding. This is where too many examples of the same object occur closely together and simply confuse our mind, an example being &#8211; how hard it is to count the separate lines here IIIIIIII- I can’t do this as easily as reading the word seven ????.</li></ul>



<p></p>



<p>CapsNets are effectively a vectorized version of ConvNets. In ConvNets, each neuron in a layer gives the probability of the presence of a feature defined by its kernel. CapsNets do more, they convey not only the presence but the “pose” of the feature. By <g class="gr_ gr_20 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Punctuation only-ins replaceWithoutSep" id="20" data-gr-id="20">pose</g> we could mean the scale, skew, rotation, viewpoint etc. The routing by agreement algorithm in a CapsNet assesses the match between the ”pose” of a lower level features and the features in the level above it, say bits of digits to a whole digit or components of a face to a whole face. When these agree e.g. the eyes, mouth and nose elements all correspond to a face of size X looking left, we get an indication that the higher-level feature is present. When the pose of many lower level features <g class="gr_ gr_18 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Grammar multiReplace" id="18" data-gr-id="18">match</g> a single higher-level feature we can be very certain that a higher level feature exists. If this is hard to digest have a look at the Hinton or Géron videos I reference later. These may help.</p>



<p>This blog post is about building a full capsule
net in Excel for handwritten digit classification using MNIST data. MNIST is a
data set of hand written digits provided by Yann LeCun and a staple in data
science. Given the novelty of this new algorithm there is not so much
information available on the net but the best I came across was Geoff Hinton’s
original paper https://arxiv.org/pdf/1710.09829v2.pdf and Aurélien Géron’s Keras code and associated videos <a href="https://youtu.be/pPN8d0E3900">https://youtu.be/pPN8d0E3900</a> . I also found this talk by Dr Charles Martin helpful <a href="https://youtu.be/YqazfBLLV4U">https://youtu.be/YqazfBLLV4U</a> . </p>



<p>&#8212;&#8212;- The remainder of this blog post <g class="gr_ gr_3 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-del replaceWithoutSep" id="3" data-gr-id="3">improves,</g> but is probably only of interest to nerds and insomniacs. &#8212;&#8212;- </p>



<p>CapsNets are exciting and potentially far more powerful than standard convolutional neural nets because: </p>



<p></p>



<ul><li>They don’t lose information via subsampling or max pooling which is the ConvNet way to introduce some invariance, CapsNets weights encode viewpoint-invariant knowledge and are equivariant.</li><li>Through the above approach, they know the “pose” of parts and the whole which allows them to extrapolate their understanding of geometric relationships to radically new viewpoints (equivariance),</li><li>They have built-in knowledge of the relationship of parts to a whole.</li><li>They contain the notion of entities and those GIFs on the top of the blog represent movement along one dimension of the linear manifold of the top level entity that represents an eight.</li></ul>



<p></p>



<p>The Hinton &amp; Géron resources were superb for the forward model i.e. the algorithm that identifies categories based on trained parameters which <g class="gr_ gr_21 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" id="21" data-gr-id="21">is</g> the true innovation of the CapsNet. However, information on the backward model, the mechanism by which it learns, was sparse and scattered. This is not surprising because back-propagation of the gradient of the loss function with respect to the parameters of each layer is mathematically so straightforward that the deep learning frameworks of TensorFlow and Keras do this automatically. However, wiring the chain rule backwards through the twists and turns of an <g class="gr_ gr_10 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="10" data-gr-id="10">Excel based</g> CapsNet architecture was a challenge. This was largely because instead of reading through the theory first, I guessed, fiddled and played, until in exasperation, I looked harder for the “right” approach. I certainly learned a lot about how not to do it and it’s possible that I turned up some novel ideas, but above all, when I eventually “got it right’, the theory sunk in and meant a lot more to me.</p>



<p><strong>How a capsule net works</strong></p>



<p>I’ve uploaded a video walkthrough of the Excel model here: <a href="https://youtu.be/4uiFJZjw6fU">https://youtu.be/4uiFJZjw6fU</a> It&#8217;s probably not for the casual reader but is a more visual way to see what&#8217;s happening and also covers a lot of the issues I&#8217;ve written about in this blog.</p>



<p>A big difference between CapsNets and standard neural networks is that CapsNets contain the notion of entities with pose parameters i.e. the network identifies component parts (lower level capsules) and determines if their pose parameters match those of the higher-level capsules where these parts are combined. Capsules require multiple dimensions to convey their pose and the diagram below shows where the additional dimensions appear:</p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" width="1024" height="731" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Schematic-shwoing-8D-Capsule-Transition-1-1024x731.jpg" alt="" class="wp-image-21525" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Schematic-shwoing-8D-Capsule-Transition-1-1024x731.jpg 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Schematic-shwoing-8D-Capsule-Transition-1-300x214.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Schematic-shwoing-8D-Capsule-Transition-1-768x548.jpg 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Schematic-shwoing-8D-Capsule-Transition-1-700x500.jpg 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Schematic-shwoing-8D-Capsule-Transition-1-1100x785.jpg 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Schematic-shwoing-8D-Capsule-Transition-1.jpg 1471w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fig: This schematic shows the transition from scalar values by neuron at layer 2 to 8-dimensional vectors that represent the capsules in Layer 3.</figcaption></figure></div>



<p><em>F</em>The CapsNet I built is like the structure in
Hinton’s paper but quite a bit smaller with 5&#215;5 kernels for 4 &amp; 8 channels
in the first two layers and 8-dimensional Digit Capsules. This gives only 100 x
8D primary capsules and 10 x 8D digit capsules. The results for this are still
impressive i.e. 98.7% accuracy rather than the 99.5% accuracy that we see with
1152 x 8D primary and 10 x 16D digit capsules in the paper. I chose this
reduction after paring down the Keras model to a size that would be manageable
in Excel without too much Excel build optimization. The structure of my forward
CapsNet, or rather a screenshot of the actual CapsNet as it appears in my Excel
spreadsheet, is below.<br></p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" width="1024" height="574" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Forward-Net-1-1024x574.jpg" alt="" class="wp-image-21536" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Forward-Net-1-1024x574.jpg 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Forward-Net-1-300x168.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Forward-Net-1-768x430.jpg 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Forward-Net-1-700x392.jpg 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Forward-Net-1-1100x617.jpg 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Forward-Net-1.jpg 1504w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fig: View of the forward CapsNet built on-sheet in Excel</figcaption></figure></div>



<p>The backpropagation or learning mechanism is much bigger. In the figure below, I’ve put together several screen-shots of the entire spreadsheet model. This covers 1000 rows and 7500 columns. The bulk of the area relates to the decoder sections with their 784 neurons in the final layer and the Adam optimization I used to get the learning speed up. I’ve highlighted the big blue collection of layer 3 transform matrices on this to give you an indication of the size w.r.t. the above forward model, additional calculations, and complexity required for the backward pass.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Full-Net-1024x182.jpg" alt="" class="wp-image-21527" width="580" height="103" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Full-Net-1024x182.jpg 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Full-Net-300x53.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Full-Net-768x137.jpg 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Full-Net-700x125.jpg 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Full-Net-1100x196.jpg 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Excel-View-of-Full-Net.jpg 1506w" sizes="(max-width: 580px) 100vw, 580px" /><figcaption>Fig: Excel Screenshot of the full Capsule Net spreadsheet with the lengthy decoder network and multiple parameter optimization blocks.</figcaption></figure></div>



<h2>The Process of Building.</h2>



<p>Now I know what I’m doing I could probably mechanically build this in a couple of days or modify the size of a layer in an hour or so. However, if I built it in Keras it would take a couple of hours and <g class="gr_ gr_8 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar replaceWithoutSep" id="8" data-gr-id="8">to</g> modify a layer would take seconds. Now I understand what I’m building but the initial build took several months and even understanding Aurélien Géron’s Keras code took days.&nbsp; </p>



<p>My initial approach was to build only the forward model and feed this with pre-trained parameters from a modified version of Aurélien Géron’s code. My reduced spec (L1: 5x5K, 4c, L2: 5&#215;5, 8c s2 L3: 8Dx10caps, Decoder 50, 50, 784) took the parameter count down from the 44,489,744 of the original paper that Aurélien had replicated to a more Excel manageable 127,943. Keras trials on this gave an MNIST test result of 98.69%, higher that I could regularly obtain with my Excel ConvNets but way below the 99.43% that the bigger model achieves. </p>



<p>Another important modification I made to get a clear comparison was to initially train and test the reduced size Keras model only on the 10k MNIST data set. This reached a 100% overfit after about 34 epochs. What I mean by this is that the model was able to learn the 10K data set to 100% accuracy i.e. the model could store sufficient information in its parameters to categorize all 10k MNIST digits correctly. This is useless as a generalizable model but gave me an easy test to see if the Keras trained parameters, when transferred to Excel would deliver the same result on the same test set. </p>



<p>I was on several steep learning curves throughout this process and was delighted when I eventually got a perfect match. However, as I added the backward model and learned from these already overfit parameters, the model’s precision collapsed to 30% or so and only then began to learn. I saw more failure modes than I can recall and given the slow speed at which the excel model learned had plenty of time to hypothesize the causes.  </p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" width="1024" height="731" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Examples-of-Learning-Curves-1024x731.jpg" alt="" class="wp-image-21528" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Examples-of-Learning-Curves-1024x731.jpg 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Examples-of-Learning-Curves-300x214.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Examples-of-Learning-Curves-768x548.jpg 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Examples-of-Learning-Curves-700x500.jpg 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Examples-of-Learning-Curves-1100x785.jpg 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Examples-of-Learning-Curves.jpg 1471w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fig: A small selection of the model’s precision curves as the bugs dropped out.</figcaption></figure></div>



<p>I began this process in August of 2018 and eventually confirmed that I had a working CapsNet in Excel on 31-December 2018. The closing stages of using this odd approach of matching to a 100% overfit are summarized below.</p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" width="1024" height="731" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Test-By-Learn-from-Overfit-1024x731.jpg" alt="" class="wp-image-21529" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Test-By-Learn-from-Overfit-1024x731.jpg 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Test-By-Learn-from-Overfit-300x214.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Test-By-Learn-from-Overfit-768x548.jpg 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Test-By-Learn-from-Overfit-700x500.jpg 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Test-By-Learn-from-Overfit-1100x785.jpg 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Test-By-Learn-from-Overfit.jpg 1471w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fig: The figure above was sufficient proof to ensure that my Excel backward pass wiring was functionally the same as Aurélien Géron’s Keras code.</figcaption></figure></div>



<p>Once I had confirmation that it was eventually working, adding the full 60k digit training data, loading 50 epoch Keras trained parameters and running a 10k test in Excel that matched the 98.69% Keras Test result took no time. I headed out happy that night to celebrate the new year.  </p>



<h2>Interesting Learning</h2>



<p><strong>Triple Axel</strong></p>



<p>One challenge that I faced was wiring the back
propagation of the convolutions. Though I now know this to be straightforward,
I went through the process without really thinking through the maths or
approach and miraculously ended up with the right answer. On further research I
found that this obscure transpose and flip of the kernel over its anti-diagonal
is apparently called a Triple Axel named after a figure skater from the 19<sup>th</sup>
century. This is according to User1551 on math.stackexchange, though I can’t
find any other evidence, I love the idea and am happy to propagate the meme. </p>



<p>In TensorFlow and as I understand it, <g class="gr_ gr_14 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-ins replaceWithoutSep" id="14" data-gr-id="14">basically</g> all other code, the same approach is handled with a sparse weight-sharing matrix such that the reverse path through the matrix multiplication can simply be accomplished with a transpose of this matrix to get the same connectivity. </p>



<p>In Excel, the Triple Axel transformation of the
kernel is much easier to code, use and audit, so makes for a nice approach.</p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" width="1024" height="618" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Triple-Axel-B-1024x618.png" alt="" class="wp-image-21530" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Triple-Axel-B.png 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Triple-Axel-B-300x181.png 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Triple-Axel-B-768x464.png 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Triple-Axel-B-700x422.png 700w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fig: The above figure shows the different approach to convolutional matrix multiplication on the forward and backward path for Excel v Python.</figcaption></figure></div>



<p><strong>Backpropagation through a stride</strong></p>



<p>I tried many attempts at getting this to work
before I began looking for a proper explanation. The best I came across was “A
guide to convolution arithmetic for deep learning” by Vincent Dumoulin and
Francesco Visin from Institut des algorithmes d’apprentissage de Montréa. I
would check their paper out if you’re confused.</p>



<p>Ultimately the wiring in Excel for this is very
straightforward and simply requires interspersing zeros in the post-stride
channel to bring the size of the channel up to the pre-stride size as shown
below. The re-shaping from the 8D capsule gradients was also straightforward
and the figure below shows how I unrolled these capsules into the convolutional
channels. Again, I tried all sorts of approaches to this simple unfurling of
the channels and arrived at the correct one by chance.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Backprop-through-a-stride-B.png" alt="" class="wp-image-21531" width="392" height="257" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Backprop-through-a-stride-B.png 781w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Backprop-through-a-stride-B-300x197.png 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Backprop-through-a-stride-B-768x504.png 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Backprop-through-a-stride-B-700x460.png 700w" sizes="(max-width: 392px) 100vw, 392px" /><figcaption>Fig: The figure above shows the path of backpropagation through a stride and from the 8D capsules to one of the 8 channels of my convolutional layer 2.</figcaption></figure></div>



<p><strong>Backprop through the affine transforms</strong></p>



<p>I spent some time working through ways to build the derivative of the layer 2 output function <a>dZ<sup>[2]</sup> </a>This comprises a sum of the matrix products of each transformation matrix by the dZ<sup>[3]</sup> and routed via the derivative of the Layer 2 activation function i.e. only pass the gradient back if the <g class="gr_ gr_4 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" id="4" data-gr-id="4">U</g><sub><g class="gr_ gr_4 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling ins-del multiReplace" id="4" data-gr-id="4">i</g></sub> is greater than zero. </p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" width="1024" height="257" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/dZ-2-creation-1024x257.png" alt="" class="wp-image-21533" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/dZ-2-creation-1024x257.png 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/dZ-2-creation-300x75.png 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/dZ-2-creation-768x193.png 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/dZ-2-creation-700x175.png 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/dZ-2-creation-1100x276.png 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/dZ-2-creation.png 1536w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fig: This is a view of the wiring for the derivative of the gradient of the layer 2 output function for the second primary capsule of 100.</figcaption></figure></div>



<p><strong>Margin loss &amp; “Brim Loss?”</strong></p>



<p>Generating a dZ<sup>[3] </sup>with the right
dimension (8D) to pass back through the affine transforms also caused me some
issues. I tried various approaches but the one that mimicked TensorFlow, and one
I therefore assumed to be correct, was simply to multiply the derivative of the
loss function by the final digit capsule vectors i.e. after the vector
nonlinearity or squash function. </p>



<p>I used the margin loss quoted in the paper but made some silly mistakes in calculating its derivative that negated the use of the max function and instead of ignoring gradients for activations greater than 90% and less than 10%, actually penalized high certainty above and below the thresholds.&nbsp; This effectively optimized for uncertainty or specifically a 90% certainty of true and a 10% certainty of false. An interesting result of this was that the model trained up to the 100% overfit benchmark I was using faster. This approach also potentially introduces additional regularization at little cost in time and code.&nbsp;   </p>



<p>Because Excel is so slow I’ve stuck with this approach and until I find the correct name for it am calling it a “Brim” loss because the resulting loss curve looks like the brim of a hat. I explain this further in the figure below. </p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" width="1024" height="731" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-Loss-1024x731.jpg" alt="" class="wp-image-21552" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-Loss-1024x731.jpg 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-Loss-300x214.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-Loss-768x548.jpg 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-Loss-700x500.jpg 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-Loss-1100x785.jpg 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-Loss.jpg 1471w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fig: The figure above shows the Margin loss from Geoff Hinton’s paper alongside my made-up “Brim” loss that optimized for uncertainty </figcaption></figure></div>



<p>I ran 20 learning trials over 10 epochs for both the Margin Loss and the Brim loss, each with differing seeds for initialization. Multiple trials are the only way to get a rough measure of the advantage that the Brim loss may offer over the margin loss in this case. The trials below show the learning curves (as precision rather than loss) and the improvement is quite substantial. These were, of <g class="gr_ gr_9 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-ins replaceWithoutSep" id="9" data-gr-id="9">course</g> run in Python as the process would have taken weeks in Excel.</p>



<figure class="wp-block-image"><img loading="lazy" width="1024" height="556" src="https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-and-Margin-Trails-1-1024x556.png" alt="" class="wp-image-21578" srcset="https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-and-Margin-Trails-1-1024x556.png 1024w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-and-Margin-Trails-1-300x163.png 300w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-and-Margin-Trails-1-768x417.png 768w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-and-Margin-Trails-1-700x380.png 700w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-and-Margin-Trails-1-1100x597.png 1100w, https://www.richardmaddison.com/wp-content/uploads/2019/01/Brim-and-Margin-Trails-1.png 1152w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>20 trials of Brim v Margin loss learning with the mean precision by epoch in heavy solid, 1 Standard Deviation lines are dotted and individual 10 epoch trial in thin solid. I added a copy on the Brim mean to the Margin chart for comparison.</figcaption></figure>



<p><strong>The Next Steps </strong></p>



<p>If I carry this further in Excel I think the
next step will be to introduce an innate graphics model along the lines of “Extracting
pose information by using a domain specific decoder” Navdeep Jaitly &amp; Tijmen Tieleman. This will allow the model to run unsupervised learning to go from
pixels to entities with poses and opens the ability to train on MNIST to with
only a handful of supervised inputs. </p>



<p>I’m also keen to explore Matrix capsules, EM
routing, running on the SmallNORB data set and of course optimizing Excel to
run more quickly, perhaps making use of the iterative functions in Excel.

I’ll update this blog as I make progress but
would welcome any encouragement or tips and corrections.



</p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2019/01/13/building-a-capsule-net-in-excel/">Building a Capsule Net in Excel</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.richardmaddison.com/2019/01/13/building-a-capsule-net-in-excel/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>A Neural Network in Excel learning the difference between fighting and dancing.</title>
		<link>https://www.richardmaddison.com/2018/06/16/neural-network-excel-learning-difference-fighting-dancing/</link>
					<comments>https://www.richardmaddison.com/2018/06/16/neural-network-excel-learning-difference-fighting-dancing/#comments</comments>
		
		<dc:creator><![CDATA[Richard Maddison]]></dc:creator>
		<pubDate>Sat, 16 Jun 2018 04:35:54 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI in Excel]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Machine Learning in Excel]]></category>
		<category><![CDATA[Neural Networks in Excel]]></category>
		<guid isPermaLink="false">https://www.richardmaddison.com/?p=21488</guid>

					<description><![CDATA[<p>I’ve been building Neural Networks in Excel for a few months and have been looking for a data set that would capture something too hard for humans to explain but easy for us to identify. Basically, a classification that we can recognize instantly but can’t...</p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2018/06/16/neural-network-excel-learning-difference-fighting-dancing/">A Neural Network in Excel learning the difference between fighting and dancing.</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>I’ve been building Neural Networks in Excel for a few months and have been looking for a data set that would capture something too hard for humans to explain but easy for us to identify. Basically, a classification that we can recognize instantly but can’t easily explain. I also wanted as few dimensions in the data as possible to keep the Excel light. The slightly silly problem that I came up with was our ability to see a relationship between two randomly generated stick figures placed side by side. There are plenty of interpretations we can put on their relationship but the classes I came up with were fighting or dancing. For Example:</p>
<p>I think these are Dancing:</p>
<p><img loading="lazy" class="wp-image-21489 aligncenter" src="https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-1-Dancing-Images-300x63.png" alt="" width="872" height="183" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-1-Dancing-Images-300x63.png 300w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-1-Dancing-Images-768x162.png 768w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-1-Dancing-Images-700x148.png 700w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-1-Dancing-Images.png 1008w" sizes="(max-width: 872px) 100vw, 872px" /></p>
<p>I think these are Fighting:</p>
<p><img loading="lazy" class="wp-image-21490 aligncenter" src="https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-2-Fighting-Images-300x74.png" alt="" width="872" height="215" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-2-Fighting-Images-300x74.png 300w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-2-Fighting-Images-768x189.png 768w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-2-Fighting-Images-700x172.png 700w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-2-Fighting-Images.png 1017w" sizes="(max-width: 872px) 100vw, 872px" /></p>
<p>The plots are interesting because there are many different ways that people dance or fight and our brains can usually see these, but can you describe the general difference.</p>
<p>I thought no but given the unreasonable effectiveness of neural networks to recognize patterns, maybe they could figure out what that difference was.</p>
<p><img loading="lazy" class="wp-image-21491 aligncenter" src="https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-3-Fighting-Dancing-Patterns-300x156.jpg" alt="" width="677" height="352" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-3-Fighting-Dancing-Patterns-300x156.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-3-Fighting-Dancing-Patterns-768x399.jpg 768w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-3-Fighting-Dancing-Patterns-700x364.jpg 700w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Figure-3-Fighting-Dancing-Patterns.jpg 850w" sizes="(max-width: 677px) 100vw, 677px" /></p>
<p>These plots show how the cases differ for those labelled by me as dancing and separately, to the right, as fighting. There really is nothing obvious here.</p>
<p>The spreadsheet generates stick figure couples with 10 dots per figure and I’ve laboriously picked out and labelled 600 cases where they appear to me to be either fighting or dancing. The spreadsheet contains two on-sheet fully connected neural networks with 2 layers of 15 and 10 neurons ending in a sigmoid binary classifier and looking at 24 XY coordinates in a flat 48-dimensional vector (there are 2 repeated dots; the neck and the pelvis – wasteful but I was lazy). The first Neural Net runs 5 batches of 100 cases each and learns the difference between Fighting and Dancing and the second runs stochastic gradient descent on a single case at a time using the learned parameters of the first but looking at a separate test dataset of 100 fresh, unseen labelled cases.</p>
<p>Excel lends itself well to building and understanding Neural Networks because it requires an explicit physical layout for the layers within the model, they are not hidden in virtual arrays defined by for-loops but displayed as actual blocks of numbers. The plot below shows the core of the model’s “brain”, the hidden layers and the parameter weights that, once trained, capture the function that we humans each develope to distinguish between aggression and joy. The Momentum and Gradient layers give you a view of how the spreadsheet learns. The gradients of the function are derived directly from a case (or 100 cases for this vectorized model) run through the current learned parameters. The gradients give the direction of travel required to update the parameters of each layer in order to reduce the loss function i.e. the difference between my classification of a particular case and the spreadsheet’s current understanding. If left to train for an hour or so with a small enough learning rate, the model will learn the training cases to an accuracy of 100%. However, this would be an example of overfitting and would not capture the essence of fighting or dancing and therefore not generalize well to new cases.</p>
<p><img loading="lazy" class=" wp-image-21499 aligncenter" src="https://www.richardmaddison.com/wp-content/uploads/2018/06/Layers-1-300x173.jpg" alt="" width="500" height="288" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/06/Layers-1-300x173.jpg 300w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Layers-1-345x198.jpg 345w, https://www.richardmaddison.com/wp-content/uploads/2018/06/Layers-1.jpg 528w" sizes="(max-width: 500px) 100vw, 500px" /></p>
<p>Excel has some real advantages for coding neural nets, beyond the fact that for me it’s the tool I know and I’m still Python semi-literate. I think these advantages are as follows:</p>
<ul>
<li>They lay the structure bare. All layers, all neurons and all connection weights are exposed, no searching or displaying the results of nested For loops, just 2D blocks of numbers.</li>
<li>When coding you make fundamental wiring errors: This is not so helpful if you know where you&#8217;re going but at this stage, neural networks are still a black art. In all the training I’ve done to date the experts use the phrase “this works well but we don’t know why”.</li>
<li>It&#8217;s slow to train and you see it train in minute step by step detail. This gives you time to think and ponder what’s going on, what’s wrong and what’s just plain peculiar.</li>
</ul>
<p>If you’re interested in this, please let me know and I would be happy to share more with you. I’d certainly welcome any questions and queries you may have and will be releasing some of the base models on GitHub. Any expressions of interest prior to this would also be welcome.</p>
<p>I&#8217;ve uploaded a Youtube Video of the model running here: <a href="https://youtu.be/ua27l-nZ944">https://youtu.be/RsQp-5feqZQ</a> and you can see more related Neural Network in Excel stuff on my channel here: <a href="https://www.youtube.com/channel/UC44Q4IXVrU6qUtezNU9X_Uw?view_as=subscriber">https://www.youtube.com/</a></p>
<p>This videos and associated material if released are available to you under a Creative Commons Attribution-NonCommercial 4.0 International License the details of which can be found here: <a href="https://creativecommons.org/licenses/by-nc/4.0/legalcode">https://creativecommons.org/licenses/by-nc/4.0/legalcode</a></p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2018/06/16/neural-network-excel-learning-difference-fighting-dancing/">A Neural Network in Excel learning the difference between fighting and dancing.</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.richardmaddison.com/2018/06/16/neural-network-excel-learning-difference-fighting-dancing/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>Building Convolutional Neural Networks in Excel</title>
		<link>https://www.richardmaddison.com/2018/05/03/building-convolutional-neural-networks-excel/</link>
					<comments>https://www.richardmaddison.com/2018/05/03/building-convolutional-neural-networks-excel/#comments</comments>
		
		<dc:creator><![CDATA[Richard Maddison]]></dc:creator>
		<pubDate>Thu, 03 May 2018 16:31:01 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI in Excel]]></category>
		<category><![CDATA[Building Convolutional Neural Networks in Excel]]></category>
		<category><![CDATA[Convolutional Neural Networks in Excel]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Machine Learning in Excel]]></category>
		<category><![CDATA[Neural Networks]]></category>
		<guid isPermaLink="false">https://www.richardmaddison.com/?p=21460</guid>

					<description><![CDATA[<p>This blog relates to work I’ve done in Excel to build a handwritten digit classifier; basically, a spreadsheet that can read handwriting up to human levels of accuracy. This required a convolutional neural network &#8211; the engine behind just about all machine learning related to...</p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2018/05/03/building-convolutional-neural-networks-excel/">Building Convolutional Neural Networks in Excel</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>This blog relates to work I’ve done in Excel to build a handwritten digit classifier; basically, a spreadsheet that can read handwriting up to human levels of accuracy. This required a convolutional neural network &#8211; the engine behind just about all machine learning related to images. I’m unaware of anyone else who has done this in Excel so please let me know if you come across others.</p>
<p>I have been deeply involved with financial analysis and mathematical models for most of my career but began re-tooling with machine learning over the last two years.  I’m currently working through Andrew Ng’s brilliant Deep Learning course on Coursera and I’ve reached Course 4, Week 3. Throughout the course, I’ve been building out the neural net architectures he describes in Excel.  Excel is not “yet” the right medium to build convolutional neural nets for real-world applications. However, I know Excel and find it easier to construct these on a spreadsheet rather than in a new language. I certainly hope this state won’t last long.</p>
<p>Building in Excel may be slow, but it makes neural nets very transparent, in that you are visually confronted with the layers, neurons and their associated parameters and related calculations. Excel gives a less abstract view of a neural net than vectorised Python code and it helped me immensely in developing an understanding of these fantastic new tools.</p>
<p>Excel lets you see inside the machine but for me, an even more useful element was the bugs and miss wiring that I introduced. Neural networks are very robust to bugs, in fact, they often continue to learn but fail in odd and interesting ways. Figuring out the failure based on the learning behaviour really forced me to think about the maths and the structure. I would also argue that the speed of Excel gives you time to think as the failures manifest themselves. For a deeper understanding of Neural Nets go to Andrew Ng. My intuitions are still weak but certainly improving.</p>
<p>The classic beginners exercise for deep learning is to build an MNIST digit classifier. This relates to Yann LeCun’s data set of 60,000 handwritten digits (0 to 9) with an associated 10,000-digit test set. Yann has made this data available to all and there are plenty of higher level language examples.</p>
<p>My initial attempts at classifiers using plain vanilla fully connected neural networks are on my YouTube channel below but Mike Pallister was faster off the mark than me and is also worth a look.</p>
<p>Single Digit Classification <a href="https://www.youtube.com/watch?v=4P5r0tT7Hsc&amp;t=33s">https://www.youtube.com/watch?v=4P5r0tT7Hsc&amp;t=33s</a></p>
<p>Batch Digit Classification <a href="https://www.youtube.com/watch?v=bJcv9vi4Gqg&amp;t=6s">https://www.youtube.com/watch?v=bJcv9vi4Gqg&amp;t=6s</a></p>
<p>However, this blog relates to last week’s modelling of a convolutional neural net; again with ADAM optimisation and Batch capability. After several silly but informative mistakes, I completed the model below last night and would welcome your thoughts.</p>
<div style="width: 1060px;" class="wp-video"><!--[if lt IE 9]><script>document.createElement('video');</script><![endif]-->
<video class="wp-video-shortcode" id="video-21460-1" width="1060" height="661" preload="metadata" controls="controls"><source type="video/mp4" src="https://www.richardmaddison.com/wp-content/uploads/2018/05/CNN-in-Excel-Maddison-1.mp4?_=1" /><a href="https://www.richardmaddison.com/wp-content/uploads/2018/05/CNN-in-Excel-Maddison-1.mp4">https://www.richardmaddison.com/wp-content/uploads/2018/05/CNN-in-Excel-Maddison-1.mp4</a></video></div>
<p>The model has very few layers, neurons and parameters, its tiny in comparison to real-world examples. By way of a full description, there are two convolutional layers with max pooling taking the images of the handwritten digits from 28h x 28w pixels to 24h x 24w x 4c (4 channels) and 12h x 12w x 4c after max pooling. Layer 2 condensed these to 8h x 8w x 8c and then 4h x 4w x 8c after max pooling. The final two fully connected layers had 15 and 10 neurons respectively. Giving a grand total of 936 convolutional parameters and 2,095 in the fully connected layers. This doesn’t sound like a lot of space to capture the vagaries of human handwriting, but it does. On its first 100k iterations I’m seeing 98.75% accuracy on the training data and 98% on the test data.  I would love to know what human accuracy levels on this data are but from my experience, it’s not much more than this.</p>
<p>I’d certainly welcome any questions and queries you may have and will be releasing some of the base models on GitHub. Any expressions of interest prior to this would also be welcome.</p>
<p>These videos and associated material if released are available to you under a Creative Commons Attribution-NonCommercial 4.0 International Licence the details of which can be found here: https://creativecommons.org/licenses/by-nc/4.0/legalcode</p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2018/05/03/building-convolutional-neural-networks-excel/">Building Convolutional Neural Networks in Excel</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.richardmaddison.com/2018/05/03/building-convolutional-neural-networks-excel/feed/</wfw:commentRss>
			<slash:comments>14</slash:comments>
		
		<enclosure url="https://www.richardmaddison.com/wp-content/uploads/2018/05/CNN-in-Excel-Maddison-1.mp4" length="56876931" type="video/mp4" />

			</item>
		<item>
		<title>Neural Networks in Excel – Finding Andrew Ng’s Hidden Circle</title>
		<link>https://www.richardmaddison.com/2018/03/29/neural-networks-in-excel-finding-andrew-ngs-hidden-circle/</link>
					<comments>https://www.richardmaddison.com/2018/03/29/neural-networks-in-excel-finding-andrew-ngs-hidden-circle/#comments</comments>
		
		<dc:creator><![CDATA[dil]]></dc:creator>
		<pubDate>Thu, 29 Mar 2018 11:53:09 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI in Excel]]></category>
		<category><![CDATA[Neural Network]]></category>
		<guid isPermaLink="false">https://www.richardmaddison.com/?p=21379</guid>

					<description><![CDATA[<p>I’m currently re-tooling as a data scientist and am halfway through Andrew Ng’s brilliant course on Deep learning in Coursera. I’m a spreadsheet jockey and have been working with Excel for years, but this course is in Python, the lingua franca for deep learning. Hence,...</p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2018/03/29/neural-networks-in-excel-finding-andrew-ngs-hidden-circle/">Neural Networks in Excel – Finding Andrew Ng’s Hidden Circle</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><span style="font-weight: 400;">I’m currently re-tooling as a data scientist and am halfway through Andrew Ng’s brilliant course on Deep learning in Coursera. I’m a spreadsheet jockey and have been working with Excel for years, but this course is in Python, the lingua franca for deep learning. Hence, I found myself struggling not only with the new concepts associated with the subject, but also the syntax of Python – agony.</span></p>
<p><span style="font-weight: 400;">The first programming assignment of Andrew’s second course “Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization” was to build a basic neural network to identify a function that would separate a scatter of red and blue dots based on their X, Y coordinates. For a human this is simple, the red dots formed a circle inside the black dots with a bit of random scatter but for an algorithm in Excel, well, before Andrew’s course it was not so easy.</span></p>
<p><img loading="lazy" class="wp-image-21480 aligncenter" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/Circle-Picture-300x296.png" alt="" width="401" height="396" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/Circle-Picture-300x296.png 300w, https://www.richardmaddison.com/wp-content/uploads/2018/03/Circle-Picture-768x757.png 768w, https://www.richardmaddison.com/wp-content/uploads/2018/03/Circle-Picture-700x690.png 700w, https://www.richardmaddison.com/wp-content/uploads/2018/03/Circle-Picture.png 908w" sizes="(max-width: 401px) 100vw, 401px" /></p>
<p>The assignment required the Neural Network to find a boundary between red and black dots. This diagram shows the neural nets final output as shaded orange &amp; blue areas.<span style="font-weight: 400;">I had struggled to visualise what was happening inside the neural network and desperately wanted to see something in ‘my’ language, Excel. I searched the net for examples and came up with nothing other than some single neuron examples and one rough MNIST digit classifier. This didn’t make sense to me as I would have thought that Excel was an ideal teaching medium; lots of native speakers and a 2D layout ideal for exploring the dimensions of the inner hidden layers. </span></p>
<p><b>Could Excel expose the mystery of Neural Networks?</b></p>
<p><span style="font-weight: 400;">So, one sunny afternoon at home in India, I set off on a mission. The first step was to grab Andrew’s Data – nightmare, I couldn’t lift it out of Python, I was illiterate. After 2 futile hours, I did the obvious and constructed it myself. This took 5 minutes and opened the possibility of interesting patterns, say a doughnut or a letter. With the data in hand I began construction under the assumption that at some point I’d encounter insurmountable barriers on the way, but to my delight, there weren’t any. After a couple of hours, and with some luck, I’d built it and was ready to start the iterations and learn the function. I hit the macro button, watched for a while, however, nothing happened, I restarted a few times, checked the code then headed off for a coffee. Now the joy of a neural network is that it programs itself which in Excel takes luck and time. I came back a couple of hours later. Wow! I had iterating patterns and a learning curve that was headed in the right direction. The slow speed of my code was, to some extent, a plus and allowed me to see the function develop and ultimately segregate the dots. With the spreadsheet working and through the process of building it, the mystery of basic neural networks and back-propagation was finally clear and exposed.</span></p>
<div id="attachment_21391" style="width: 1664px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-21391" loading="lazy" class="wp-image-21391 size-full" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232.png" alt="" width="1654" height="743" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232.png 1654w, https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232-300x135.png 300w, https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232-768x345.png 768w, https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232-1024x460.png 1024w, https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232-700x314.png 700w, https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232-1100x494.png 1100w, https://www.richardmaddison.com/wp-content/uploads/2018/03/27011244-e1522326081232-600x270.png 600w" sizes="(max-width: 1654px) 100vw, 1654px" /><p id="caption-attachment-21391" class="wp-caption-text">The learning curve drops off at 400 iterations and by 1000 the neural network as learnt and represented the hidden function.</p></div>
<p><span style="font-weight: 400;">The first model was very slow but Andrew was covering momentum and regularisation that week in his course so I plugged the first in and attempted the second, regularisation, by making the random scatter change each iteration, this was probably more like data augmentation but it was all new to me. My thinking was that the base function behind the data was three circles and re-running the scatter on each iteration would help the model learn this underlying function rather than the training set &#8211; I got a huge increase in speed and some fascinating results on the boundary diagram.</span></p>
<p><img loading="lazy" class="alignnone wp-image-21403 size-thumbnail" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/1-e1522327953453-150x150.png" alt="" width="150" height="150" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/1-e1522327953453-150x150.png 150w, https://www.richardmaddison.com/wp-content/uploads/2018/03/1-e1522327953453-100x100.png 100w" sizes="(max-width: 150px) 100vw, 150px" /><img loading="lazy" class="alignnone wp-image-21402 size-thumbnail" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/2-e1522328102631-150x150.png" alt="" width="150" height="150" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/2-e1522328102631-150x150.png 150w, https://www.richardmaddison.com/wp-content/uploads/2018/03/2-e1522328102631-100x100.png 100w" sizes="(max-width: 150px) 100vw, 150px" /><img loading="lazy" class="alignnone wp-image-21401 size-thumbnail" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/3-e1522328159761-150x150.png" alt="" width="150" height="150" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/3-e1522328159761-150x150.png 150w, https://www.richardmaddison.com/wp-content/uploads/2018/03/3-e1522328159761-100x100.png 100w, https://www.richardmaddison.com/wp-content/uploads/2018/03/3-e1522328159761.png 215w" sizes="(max-width: 150px) 100vw, 150px" /><img loading="lazy" class="alignnone wp-image-21400 size-thumbnail" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/4-e1522328292433-150x150.png" alt="" width="150" height="150" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/4-e1522328292433-150x150.png 150w, https://www.richardmaddison.com/wp-content/uploads/2018/03/4-e1522328292433-100x100.png 100w" sizes="(max-width: 150px) 100vw, 150px" /><img loading="lazy" class="alignnone wp-image-21399 size-thumbnail" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/5-e1522328423568-150x150.png" alt="" width="150" height="150" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/5-e1522328423568-150x150.png 150w, https://www.richardmaddison.com/wp-content/uploads/2018/03/5-e1522328423568-100x100.png 100w" sizes="(max-width: 150px) 100vw, 150px" /><img loading="lazy" class="alignnone wp-image-21398 size-thumbnail" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/6-e1522328588822-150x150.png" alt="" width="150" height="150" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/6-e1522328588822-150x150.png 150w, https://www.richardmaddison.com/wp-content/uploads/2018/03/6-e1522328588822-100x100.png 100w" sizes="(max-width: 150px) 100vw, 150px" /></p>
<p><i><span style="font-weight: 400;">During the learning process (left to right) the boundary function shows weird patterns as it closes in on the red dots. </span></i></p>
<p><span style="font-weight: 400;">The Neural Net architecture and equations were more or less straight out of Andrew’s course but placed into an Excel sheet. The layout shown below is very much orientated towards an Excel instantiation, it allowed me to update the model&#8217;s Weights and Momentum in a single line of VBA for each iteration namely:  Parameters (T) = Parameters (T+1) .value. I subsequently built a fully recursive version with no VBA that made use of Excels inbuilt iteration functionality but this is less satisfying to run.</span></p>
<p>You can view a video of the model in operation here: <a href="https://www.youtube.com/watch?v=mIpJu-I13cc">https://www.youtube.com/watch?v=mIpJu-I13cc</a></p>
<div id="attachment_21390" style="width: 1034px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-21390" loading="lazy" class="wp-image-21390 size-large" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951-1024x616.png" alt="" width="1024" height="616" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951-1024x616.png 1024w, https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951-300x181.png 300w, https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951-768x462.png 768w, https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951-700x421.png 700w, https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951-1100x662.png 1100w, https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951-600x361.png 600w, https://www.richardmaddison.com/wp-content/uploads/2018/03/74227182-e1522328687951.png 1444w" sizes="(max-width: 1024px) 100vw, 1024px" /><p id="caption-attachment-21390" class="wp-caption-text">This is the basic layout of the on-sheet Excel. It required approximately 30 separate excel formula</p></div>
<p>The boundary plot i.e. the orange and blue pixels that cover the X plot and represent the boundary function discovered by the neural network required a second neural net without back propagation<span style="font-weight: 400;"> (the learning bit) that looks as follows:</span></p>
<div id="attachment_21411" style="width: 909px" class="wp-caption aligncenter"><img aria-describedby="caption-attachment-21411" loading="lazy" class="wp-image-21411 size-full" src="https://www.richardmaddison.com/wp-content/uploads/2018/03/Webp.net-resizeimage-5.png" alt="" width="899" height="424" srcset="https://www.richardmaddison.com/wp-content/uploads/2018/03/Webp.net-resizeimage-5.png 899w, https://www.richardmaddison.com/wp-content/uploads/2018/03/Webp.net-resizeimage-5-300x141.png 300w, https://www.richardmaddison.com/wp-content/uploads/2018/03/Webp.net-resizeimage-5-768x362.png 768w, https://www.richardmaddison.com/wp-content/uploads/2018/03/Webp.net-resizeimage-5-700x330.png 700w, https://www.richardmaddison.com/wp-content/uploads/2018/03/Webp.net-resizeimage-5-600x283.png 600w" sizes="(max-width: 899px) 100vw, 899px" /><p id="caption-attachment-21411" class="wp-caption-text">The Neural network used for the boundary function was even simpler with only 8 separate formulae.</p></div>
<p>I subsequently recorded a construction video which runs for over an hour but gives a lot more detail on the Excel needed to get this up and running. You can view the video here: <a href="https://www.youtube.com/watch?v=suZhX6N5LAk&amp;t=3042s">https://www.youtube.com/watch?v=suZhX6N5LAk&amp;t=3042s</a></p>
<p>You can download the Excel file here:    <a href="https://drive.google.com/open?id=1kjnTsY9yC3QaQx3f_XHhoouVULLBYgs4" target="blank" class="emd_dl_grey_light">Download Excel</a>
        <style type="text/css">
    .emd_dl_grey_light {
        -moz-box-shadow:inset 0px 1px 0px 0px #ffffff;
        -webkit-box-shadow:inset 0px 1px 0px 0px #ffffff;
        box-shadow:inset 0px 1px 0px 0px #ffffff;
        background:-webkit-gradient( linear, left top, left bottom, color-stop(0.05, #f9f9f9), color-stop(1, #e9e9e9) );
        background:-moz-linear-gradient( center top, #f9f9f9 5%, #e9e9e9 100% );
        filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#f9f9f9', endColorstr='#e9e9e9');
        background-color:#f9f9f9;
        -webkit-border-top-left-radius:0px;
        -moz-border-radius-topleft:0px;
        border-top-left-radius:0px;
        -webkit-border-top-right-radius:0px;
        -moz-border-radius-topright:0px;
        border-top-right-radius:0px;
        -webkit-border-bottom-right-radius:0px;
        -moz-border-radius-bottomright:0px;
        border-bottom-right-radius:0px;
        -webkit-border-bottom-left-radius:0px;
        -moz-border-radius-bottomleft:0px;
        border-bottom-left-radius:0px;
        text-indent:0;
        border:1px solid #dcdcdc;
        display:inline-block;
        color:#666666 !important;
        font-family:Georgia;
        font-size:15px;
        font-weight:bold;
        font-style:normal;
        height:41px;
        line-height:41px;
        width:153px;
        text-decoration:none;
        text-align:center;
        text-shadow:1px 1px 0px #ffffff;
    }
    .emd_dl_grey_light:hover {
        background:-webkit-gradient( linear, left top, left bottom, color-stop(0.05, #e9e9e9), color-stop(1, #f9f9f9) );
        background:-moz-linear-gradient( center top, #e9e9e9 5%, #f9f9f9 100% );
        filter:progid:DXImageTransform.Microsoft.gradient(startColorstr='#e9e9e9', endColorstr='#f9f9f9');
        background-color:#e9e9e9;
    }.emd_dl_grey_light:active {
        position:relative;
        top:1px;
    }
    </style></p>
<p>These videos and associated material if released are available to you under a Creative Commons Attribution-NonCommercial 4.0 International Licence the details of which can be found here: https://creativecommons.org/licenses/by-nc/4.0/legalcode</p>
<p>The post <a rel="nofollow" href="https://www.richardmaddison.com/2018/03/29/neural-networks-in-excel-finding-andrew-ngs-hidden-circle/">Neural Networks in Excel – Finding Andrew Ng’s Hidden Circle</a> appeared first on <a rel="nofollow" href="https://www.richardmaddison.com">NN in XL</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.richardmaddison.com/2018/03/29/neural-networks-in-excel-finding-andrew-ngs-hidden-circle/feed/</wfw:commentRss>
			<slash:comments>15</slash:comments>
		
		
			</item>
	</channel>
</rss>
