Blog

4/10/2015 1:06:30 PM

Since my move back to consulting I have been involved with a bunch of different big and small companies each with several big and small projects.  The one thing I almost always see missing is the ability to measure the success of a project from any of the numerous ways to gauge success.  I have been a fan of the notion of “Measure anything, measure everything” since reading it in February of 2011. 

This article basically suggests that you should have the ability to measure anything in your “system”.  And by system I don’t just mean tracking aspects of your code.  There are a great deal of company's that seem to think that you simply write some code, deploy the world, and cross your fingers.  I have worked at many of these style of company’s and it is no secret that they spend a lot of time “reacting” to customer complains rather than “pro-actively” identifying and fixing issues iteratively…and hopefully not in production.

There are a few company’s I have worked with that have taken this one step further.  They attempt to remove the random behavior of their applications by creating performance baselines.  This concept is essentially running a load test in some form or another to see what works, what doesn’t, response times, effect on underlying system, etc.  With all this timing data in hand you have a baseline.  This means that you can now safely add features to your application, rerun the performance tests, and identify differences between your baseline and the new performance measurements.  This is considerably better!

But it could be sooo much better!  I am now working with a few customers that have sizeable applications both in terms of complexity and load.  In both cases the systems are distributed, multi-run time systems.  At the easiest they have web applications, API applications, a messaging bus, various data stores, many third party dependencies, various tablet clients, etc.  There are a lot of moving pieces.  A simple load test won’t give us a complete picture with regards to subtle changes in the system.  Also, with this many things to monitor, testing once before deploying won’t cut it.  You really need to measure the health of the system, in many different ways, continuously.

When you ask a developer how they get information about their system the default answer is usually a logging strategy.  Generally this is something like log4net writing to a file or database.  But when you have 100 nodes all doing different things it is generally difficult to figure out what you should be looking at by digging through a log dump.  I was in a brown bag at Clear Measure (we do those weekly) with one of my favorite logging vendors LogEntries.  My friend Trevor Parsons was telling us about all the new features that their centralized logging solution provides.  I was quite impressed at the rate they are adding new awesome features.

LogEntries

LogEntries is essentially (at least in my world) an appender for log4net.  This means writing logging code is the same as it always has been.  Only now we redirect that log data to a central hosted location where we can easily mine the data.  Most people might jump and say that sounds like SumoLogic or LogStash or any number of other logging aggregators.  On the surface this is indeed true.  But what I like about LogEntries is that you can start to tweak your log data to be a part of your APM (application performance monitoring) story.  With LogEntries you can write tags for various patterns in your log data.  In LogEntries the log data is parsed and processed as it comes in in real time.  Not as some post process.  Meaning when data comes in that says the order taking service has stopped processing (or some other contrived really important business feature in your system) you can see it immediately.

Of course tagging is only worth while if you are looking at the log stream.  We don’t usually do that.  So how do we take advantage of this real time processing and tagging?  We can make reports around the log data.  Huge dashboards full of information that can also be populated based on log processing.  In the case something important happens you can visualize the occurrence on their dashboards.  Again only valuable if you have an information radiators in front of you…you have that right?  We use GeckoBoard for this if you were wondering.  So how do we take this to a passive style of monitoring?  LogEntries also has the ability to broadcast events.  This used to be just via email which was good enough.  But now they have all sorts of other integrations such as Slack and PagerDuty which makes this story even better.  These two tools have become very valuable to me.

image

NewRelic / APM

This is just logging data.  There is soo much more to the monitoring story.  Since I mentioned APM above, you should also be looking at NewRelic!  Any good APM tool will allow you to visualize your system from the client (javascript, ios, android) all the way back to the most internal of systems (think databases).  These sorts of tools will tell you all about how long a process took.  It should allow you to visualize where a bottleneck is in your system.  And you should be able to drill down into the visual map to determine with great granularity exactly where to look for problems.  If you have slow queries your APM tool should be able to pin point that you have a stored procedure call that isn’t performing and when. 

With this sort of data in hand you can head back to LogEntries to ferret out exactly what is going on.  But can you answer why this happened?  It was working an hour ago.  Did we do something to cause this new flair up?  Hmm.  Enter StatsD and the notion of capturing metrics for Everything you do.  Tracking business metrics in the same way that you track application code, sub systems, and processes is very important.  The biggest couple of events that have historically bit me in the butt are easy stories that most people can relate too.  Everybody has had a working system.  And shortly after launching new code a feature started to act up in unexplainable ways.  Yes – you launched 3 new features.  …and 10 new bugs.  One of those bugs may have caused a database process to go wonky.  Or you may have killed your ability to accept orders.  Hopefully have you unit tests, integration tests, and functional tests to ensure that this type of error doesn’t find its way into production…but. 

Statsd

How do you track when you last did a deployment?  Most people go talk to the guy with the deployment button at their finger tips.  However, if you work somewhere that doesn’t have automated deployments this “button” may be a manual process.  This means that while the deployment guy may normally do 10 steps, today he did 9.  Missing that 10th step is may have caused the issue.  But not having metrics around when you deployed, and specifically what was deployed means it will take longer to figure out the issue. 

StatsD is a metrics tools.  It only does a few very simple things.  It can say “this metric occurred”.  It can say “this metric took this long to complete”.  Or “this value is now X”.  It does this in a very passive manner over UDP which means if something isn’t listening on the other side your code doesn’t fail.  It also means that there is near no performance impact to gather this data from any part of your system or processes.  And you can “sample” the data in the case that you just need to know rough numbers.  Meaning every 10th time something occurs track it for example.  In C# you are just a nuget package away from starting to use StatsD.

With StatsD you can wire metrics into your deployment process with a great deal of granularity.  Enough to see that “the deployment happened at this time on this day”.  And you can track that “step 1 happened, step 2 happened, step 4 happened, step 5 happened”.  Hey wait…where is step 3?  Like LogEntries, StatsD is only half of the equation.  Without the ability to get to and visualize the data, report on the data, or get alerts from the data, why capture it?

Graphite

Graphite is one of many reporting packages that works with StatsD repositories of data.  I have customers that don’t necessarily want to stand up these sorts of components in their environments.  For that reason I tend to stick to hosted solutions when ever possible.  They are usually easier to get up and running quickly.  And tend to not require buy in from all the parties required to keep the tool running.  For this I have been tinkering on Hosted Graphite a lot lately.  It basically allows you to put your metrics and counters on a graph over time.

image

Now you are ready

With all of this data in hand you are ready to answer almost any question about your system.  And if you are doing any form of continuous delivery or even continuous deployment when you find that someone has not measured enough data you can easily wire in new metrics, deploy and start capturing more information about your system.  I can’t wait to tell you how we are using these principles to monitor the health of our projects, our engineering teams morale, and our clients happiness.

10/23/2014 1:23:55 AM

It’s official, I’ve been nominated and accepted by Microsoft as a Virtual Technology Specialist (vTSP) in Azure! I’m extremely excited to be considered among the elite in the Microsoft Partner Community, and I look forward to learning more about the program and helping businesses find value using Microsoft Azure.

One of the first questions I had during this process was “What is a VTS?”

I’ve located this text, which explains the VTS Program:  The Microsoft Virtual Technology Specialist Program (VTSP) is a select group chosen from the elite in Microsoft’s partner community, whose focus is to augment Microsoft’s internal Technology Specialist team. Their primary role is to communicate the value of Microsoft Solutions to customers and to provide architectural guidance for Enterprise Integration solutions. The Microsoft VTSP program was designed to create a deeper relationship with Microsoft Partners, the Product Teams at Microsoft Corporate, and Regional Microsoft Offices, in order to provide highly skilled solution specialists to Microsoft customers. It is designed to enable a high performance team of partner-based resources to deliver pre-sale activities and resources to empower customers and help them meet their solution and integration needs.

Great, so now we know what the VTS program is about, but what’s also really cool is I now get access to information on the Microsoft corporate network such as knowledge bases, technical articles, training materials, and other resources. I also get guest access to Microsoft facilities, and priority for participating in Microsoft marketing events.

Overall this is a great opportunity for me as well as for Clear Measure, who now has 3 Virtual Technology Specialists in its ranks, many more MVP’s, and even more ASP and C# Insiders. I’m proud to be a part of this elite community, and I look forward to all that comes from it.

I have to thank Clear Measure for putting me on this path.  I can’t say enough about how much I #LoveClearmeasure!!!

10/22/2014 4:07:11 PM

Scott Hanselman recently posted about putting yourself out there by publishing an open source project in which you do the best you can today and put it out there for others to use …and ultimately critique.  And that is hard.  I recently posted an image I love describing where the magic happens which touches on the idea of stepping out of your comfort zone to push yourself to be better.  This notion applies to nearly every industry it would seem  And this seems to be a common topic that is top of mind for many people – throughout our history as people!

image

If you look into it this topic it is generally a well described topic that goes way back in our history.  How is it that making yourself better by stepping out of your comfort zone to help you get on a path to greater success isn’t taught from day one in our school systems? 

“He who deliberates fully before taking a step will spend his entire life on one leg.” ~Chinese Proverb

This is the conversation I always find myself having with people when trying to get them to take the next step in their programming career.  Regardless of if that next step is to write a blog post, publish an open source project, contribute to an existing project, speak, host a group or internet discussion, work somewhere where you are the dumbest guy in the room, etc.  It always comes down to these questions and statements.

I’m not going to put myself out there.

I’m afraid to fail.

I’m afraid to succeed.

What are they going to say about me?

What if I’m not good enough?

What if they laugh at me?

Are people going to think I’m weird?

What are people going to think of me?

Looking back at the first book I published I am horrified.  At that time I thought distributed was simply moving the work over the wire to another box.  A web server called a backend web service hosted elsewhere but still a synchronous call for every action.  Yes…somewhat improves some scaling stories.  But not distributed.  And that book was on web forms with the MVP pattern to support better testing…at the same time that the new MVC framework was coming out – effectively the “death of web forms”…we thought.  It was also a snapshot around my understanding of DDD…but not DDD as I know it now today.  There are all sorts of things in that book that I would never do now.  If I had my preference I would have taken that book back shortly after I published it.  But it was my best effort at the time. 

AND IT DID GREAT.  And CONTINUES to do great.  Because it helps people for whom that book is still a good definition of how to do things better.  Or for institutions that have still not taken the jump to MVC…it helps.

As an example, that book is #68 on Amazon in the category of ASP.NET.  Compared to “ASP.NET MVC in Action” which came out in the same time frame and was clearly the better book (in my mind) – now placed #1,668,320.  The latest version of that book “ASP.NET MVC in Action 4” is #846,639 – a way more current topic. 

Put your thoughts, whatever they are, out there for all to see.  You will help someone.  Even if you only help them by showing them how “not to do” something.  That is help.  I can’t tell you how many iterations of YouTube videos I go through on my farm prior to jumping off into un-explored territory.  There are way more “how not to do” videos out there than videos that really help me grok something.  But it all has value.

A related side note: A focus on helping others is an easy way to achieve many goals for success.  Remove the me me me from your vocabulary and thought processes and you will shine.  If you did something that helped somebody in your day to day life, at work or otherwise, write that down and publish it for all to see.  If you write code that makes someone else’s job better, publish that.  Almost everybody does something small for people near to them. 

You just need to get over the hump to share it to a global audience and you are on the right path!

10/17/2014 7:46:26 PM

Josh Handel presented some great inner detail around Azure Search this week. Here are his slides and a recorded video of the meeting if you missed it.

· Slides: https://onedrive.live.com/?cid=c861adfa90a758e1&id=C861ADFA90A758E1%2162009&ithint=folder,pptx&authkey=!APYakLmnZVx5LQU

· Recording: http://usergroup.tv/videos/azure-search (thank you Shawn Weisfeld!!!)

In November we will have Paul Drew from DevOps Unlimited show us how to build end to end azure environments using BoxStarter and other tools. 

If you would like to present please let me know so that I can get you on the schedule! We need a December topic.

Thank you goes to Clear Measure for sponsoring us at the last meetup (pizza and drinks).  But we are certainly up for making the group better with more sponsors.  If you know of anyone that is interested in sponsoring the AzureAustin group please send them my way.  Here is the basic blurb I am have been sharing to try to capture some sponsors. If you love this group please feel free to share it with your connections too:

“I am looking for sponsors for our AzureAustin meetup group hosted at the Microsoft campus here in Austin. This is great exposure if you are a product or service provider or a talent recruiter. We meet once a month. And I send out a newsletter at least once a week capturing the goings on of Azure. As a sponsor you would receive mention in all of these communications, can have a personal presence at the meetings themselves, and get a mention on the meetup page. We basically need $100/mo for pizza and drinks for the meetup that occurs every third wednesday at 6pm. Here is the group: less http://lnkd.in/bmmXkFn

New Azurians (76 total):

Let’s get this number up!  I would love to get us to 90 or 100 users by the end of the year if possible.  Let’s start a competition.  If you refer a friend to the group please send me your name and their name.  I will enter you to win something fun in a drawing at the end of the year.  Perhaps a resharper license or similar?  I am open on the prize ideas.

Welcome to the group: Josh Handel, Jeremy, Rajiv Menon

Azure info for the week:

Free

· Azure + Chef = Awesome: Don’t get spooked by LAMP on azure!

· Scott Gu free ebook "Building cloud apps with azure"

· Microsoft Azure courses

· Three more new getting started videos!

Web Casts

· Azure Automation 101 with PowerShell and Eamon O’Reilly

· Azure Automation 102 with PowerShell and Chris Sanders

· Building myEcho – A real startup running on Azure

· Encryption in SQL Server Virtual Machines in Azure for better security

· Best practices for securing azure sql database

· Security logging and audit log collection within azure

· Azure media services and content protection with Mingfei Yan

· The java SDK for azure management with brady gaster

· Virutal networks with Narayan Annamalai

Pod Casts

· Make it real with a windows phone app with azure backend

· DocumentDB

· DevOps automation as a service

· Full stack real time monitoring from the cloud

Posts

· Docker: Docker and Microsoft: Integrating docker with windows server and MS Azure

· Docker: New windows server containers and Azure support for Docker

· Redis: Azure: Redis cache, disaster recovery to azure, tagging support, elastic scale for SQLDB, DocDB

· ML: Web services and marketplaces create a new data science economy

· ML: Distributed cloud-based machine learning

· Service Bus: NamespaceType default value change

· Notification Hubs: Rich Push

· Notification Hubs: Notify Users

· Storm Preview: Microsoft brings real-time analytics to Hadoop with Storm preview

· Accessing and using azure VM unique ID

· Table Storage: Better support for paging with Table Storage in Azure Mobile Services .NET backend

· SQL: Auditing for azure sql database now with more manageability options

· Xamarin: Building cross-platform apps with Xamarin and azure mobile services

· DocumentDB: Azure documentdb: profile of MSN health and fitness

· Preview: Support site extension for azure websites

· NEW: Managing media workflows with the new azure media services explorer tool

· Azure traffic manager

· NEW: Azure automation runbook gallery

· Mobile Services: Beta iOS SDK released

· Live 24/7 Reference streams available

· D-Series performance expectations

· Disaster recovery to azure using azure site recovery is now GA

· Automation the tedious parts of open source on Azure

Snippets/Tools

· Nuget RedDobg.Search

· RedDog Search Portal

News

· Contributions to azure documentation and SDK’s on GitHub just got simpler

Fun

· Predict the 2014 us elections and more

· Azure ML is helping CMU become more energy efficient

Jobs

- If you like building cutting edge apps targeted for Auzre come apply at the coolest engineering shop in Austin - Clear Measure!  We are always looking for talented people.  http://www.clear-measure.com/careers/

- Others listed on Indeed!

Last Week’s Outages (status)

October 2014

10/17

Network Infrastructure - East Asia, North Central US, South Central US and Southeast Asia - Partial Service Interruption

This incident has now been mitigated. From 22:30 UTC on 10/16/2014 to 10:40 UTC on 10/17/2014 a subset of customers may have seen an impact on connectivity between services in: (North Central US and Southeast Asia), (North Central US and East Asia), (South Central US and Southeast Asia). They may not be able to connect to some of their resources as a result.

10/17

SQL Database - West US - Advisory

This issue has now been mitigated. From 6:49 UTC to 8:13 UTC on 10/17/2014 a small subset of customers using SQL Database in West US may have experienced failures when they attempted to create a new database. Existing databases were not impacted.

10/16

Virtual Machines - East US - Partial Service Interruption

Starting at 13:15 UTC until 22:00 UTC on 16 October 2014 a subset of customers may have been intermittently unable to access their Virtual Machine in East US. This issue is now mitigated.

10/16

Websites - West US - Advisory

This incident has now been mitigated. From 23:36 UTC on 10/15/2014 to 3:06 UTC on 10/16/2014 a limited subset of customers using websites in West US may have seen 503 errors when attempting to access their websites.

10/16

Visual Studio Online \ Application Insights - Multi-Region - Partial Service Interruption

This issue has now been mitigated. From 1:02 UTC to 4:10 UTC on 10/16/2014 a subset of customers using Application Insights may have experienced latency or inability to access telemetry data in https://portal.azure.com/.

10/15

Websites - Multiple Regions - Advisory

From 14:30 UTC to 23:20 UTC on 15 October, 2014 customers in multiple regions attempting to create a Website in the Web App Gallery through https://manage.windowsazure.com may have experienced an Invalid URI error message. Standalone website creation outside of the Web App Gallery including Web+SQL or Web+MySQL was not affected by this issue. This issue is now resolved.

10/15

Azure Managed Cache- Multiple Regions - Advisory

From 18:00 UTC on 14 October, 2014 to 15:55 UTC on 15th October, 2014 a subset of customers using Managed Cache in multiple regions may have seen failures while performing new create and enable operations programmatically. Existing caches were not impacted by this issue. Our engineers have validated that this issue is resolved.

10/15

Visual Studio Online \ Application Insights - Multi-Region - Partial Service Interruption

This incident has now been mitigated. From 4:54 UTC to 8:45 UTC on 10/15/2014 a subset of customers using Visual Studio Online \ Application Insights may have experienced latency or inability to access data through https://portal.azure.com/.

10/15

Virtual Machines - East US - Partial Service Interruption

This incident has now been mitigated. 5:25 UTC to 6:54 UTC on 10/15/2014 a subset of customers using Virtual Machines in East US may not have been able to access their Virtual machines.

10/15

Azure Redis Cache - Multiple Regions - Advisory

This incident has now been mitigated. From 18:00 UTC on 14 October, 2014 to 1:30 UTC on 15 October, 2014 a subset of customers using Azure Redis Cache in multiple regions may have been unable to create a new cache. Existing caches are not impacted by this issue.

10/14

Visual Studio Online \ Application Insights - Multi-Region - Partial Performance Degradation

From 21:30 UTC on 13 October to 20:42 UTC on 14 October, 2014 customers using Application Insights may have experienced latency or inability to access telemetry data in https://portal.azure.com . Our engineers have validated that this issue is fully mitigated.

10/14

Media Services \ Encoding - West Europe - Partial Performance Degradation

This incident has now been mitigated. From 00:40 UTC to 2:05 UTC on 10/14/2014 Media Services encoding customers in West Europe may have experienced failures. There was no expected impact on streaming.

10/14

Storage - North Central US - Partial Service Interruption

From 22:10 to 23:30 UTC on 13 October, 2014 a limited subset of customers using Storage in North Central US may have experienced inability to access their Storage resources. This issue has been mitigated.

10/13

Virtual Machines - Partial Service Interruption

From 20:30 to 21:39 UTC on 13 October, 2014 customers may have been unable to access their Virtual Machines in East US for a brief period of time. Engineers have fully validated service recovery. This issue is now mitigated.

10/10

Visual Studio Online \ Application Insights - Multi-Region - Advisory (Limited Impact)

Between approximately 16:15 and 17:45 UTC on 10 Oct, 2014 Application Insights customers will be unable to view Usage or Diagnostic Search data. Our system has self-healed and the issue is now mitigated. Other data sets, including availability, were not impacted by this issue.

10/10

Cloud Services and Virtual Machines - North Central US - Partial Service Interruption

From the 10th Oct, 2014 at 18:51 PM to 21:10 PM UTC a subset of customers using Virtual Machines and Cloud Services in North Central US may have seen services stuck in a Starting state. This incident has now been mitigated.

10/10

Visual Studio Online \ Build - Multi-Region - Partial Performance Degradation

From the 10th Oct 05:30 AM to 16:43 PM UTC, a subset of customers using Hosted Build may have seen their build requests stuck in queued state. Our engineers have mitigated this issue, and queued builds will automatically start.

10/10

Network Infrastructure - North Europe and East US- Partial Service Interruption

From 10th Oct, 2014 at 01:44 AM to 03:59AMUTC, a subset of customers in East US and North Europe regions may have experienced connectivity issues to Azure Services, such as Virtual Machines, Cloud Services, Azure Active Directory, SQL Import and Export Service, Service Bus, Media Services, Application Insights and Websites. This incident has now been mitigated.

10/9

Automation - East US - Partial Service Interruption

From approximately 22:45 PM on the 8th Oct to 1:21 AM on the 9th of Oct, 2014 UTC, customers may have encountered issues when attempting to access the Automation Gallery. Existing runbooks were unaffected. This incident has now been mitigated.

10/8

Cloud Services and Virtual Machines (Service Management) - East Asia - Partial Service Interruption

From 14:52 PM to 16:30 PM on the 8th Oct, 2014 UTC a subset of customers may have encountered errors with Service Management functions on Cloud Services or Virtual Machines located in East Asia. There was no impact to service availability. This incident is now resolved.

10/6

Automation - East US - Partial Service Interruption

From approximately 16:00 PM to 19:15 PM on the 6th Oct, 2014 UTC customers may encounter errors when attempting to access the Automation Service through the Azure Management portal. The ability to create or modify Runbooks was temporarily impacted. Existing Runbooks continued to functioned. If you encountered any errors please retry your request.

10/3

Visual Studio Online \ Application Insights - Multi-Region - Partial Service Interruption

This incident has now been mitigated. From 20:02 UTC to 21:07 UTC on 10/3/2014 a subset of customers may have seen errors submitting or viewing Usage data. Full service functionality is now restored. Some customers may see gaps in their Usage data between 20:02 UTC and 21:07 UTC on 10/3/2014.

10/3

Visual Studio Online \ Application Insights - Multi-Region

From 02 Oct, 2014 10:43 to 03 Oct 2014 01:59 UTC Application Insights users attempting to access usage query data through https://portal.azure.com/ in their Application Insights blades would see HTTP/520 errors when accessing their reports. This incident has now been mitigated.

12/10/2014 3:14:22 PM

I couldn’t agree more with this graphic.  Find something that is truly next level in your world and do it.  Generally it is something that is being done by the “successful” people you admire in your field.  Something you might feel uncomfortable with.

Success means different things to different people…and that is a different topic.

Whom ever you admire in your world start doing what they are doing and that most others aren’t.  This might be starting a blog.  If you are already blogging actively – the next step might be getting published in a magazine.  If you have published in a magazine – you might need to write a book.  Written a book already – perhaps you now need to speak at a user group.  Done the local user group already – head out to a big conference.  Or start your own user group or conference.  What ever it is that is your next most uncomfortable thing to do is – the thing that you say “I could never do that”.  DO IT.  Start small and aim for HUGE. 

Magic rarely comes from something you have been doing for the past 15 years.  Get out there and be awesome!

10/14/2014 2:51:30 PM

I have been digging deep in Azure and AWS a lot lately.  And have been looking at the new world of internet of things.  And eventually this combination leads you to the new Event Hubs tool from Azure and the Kinesis tool from Amazon.  While researching bits for a presentation on “A Tale of Two Clouds” where I am comparing the features of the two top cloud vendors I came across a great slide deck around “Event Loop”.  And nestled in that deck was this wonderful graphic that I am sure to re-use in my “Introduction to CQRS” presentation.

image

For those that are unaware.  This graphic is depicts a developer assumedly writing code for a given application where he is struggling to fight common issues: disk I/O speed, and contention.  Generally an application will have one database behind it.  And generally the application will be written in such a manner that read data is shaped in the same manner as the write data.  A user of the application clicks a button and some data is immediately written to the database…and then immediately turns around and queries the database.  Every user interaction for that face is reading data to paint the screen or writing data as part of the interaction.  The longer you go in this model the more the single database starts to lock up the disk and the more contention starts to creep up on you (where an operation from user A is blocking an operation from user B). 

The guy in this picture needs to learn about distributed systems, DDD, and the CQRS pattern.  While I suggest that you don’t treat these as GO-TO patterns for a new application straight away.  At least understanding what they can offer you is important.  Key learning's can be applied always.  Even if you are just keeping your code loose enough to wire in some of those concepts down the road.

10/10/2014 3:43:39 PM

I can’t wait to go to Silicon Valley this weekend.  I haven’t been there in a long time.  Taking the wife.  We figured it would be a great get away location for us!

pink-speaking-pig

At the event I will be talking about Azure Search and DocumentDB.  How to set them up, how to use them.  How to load data into them…etc.  Should be a good talk.  Let me know if you are in the area so that we can meet up!

10/6/2014 7:56:05 PM

I was just accepted to speak at this years Austin Code Camp!  In this talk we will take a look at a standard web application hosted in Azure with a few different backing data stores. We will see how to wire up the common interactions (get one, get many, pagination, search, add, delete) as it pertains to each storage type. And then we will load up 1 million records see how each of the different configurations perform in those common scenarios.

pink-speaking-pig

 

The Austin .NET User Group Is Proud To Sponsor Its Eighth Annual Code Camp

This Is a One Day Conference For The Community, By The Community.

Attendance costs a mere $10 for the entire day’s worth of content (less than the cost of two lattes at Starbucks)!

The conference is on October 18th at New Horizons Computer Learning Centers of Austin from 8 am to 5 pm. Spots are limited at our new location, so register today!

http://codecamp14.adnug.org/

10/6/2014 7:52:17 PM

Great news!  We have locked up October and November presentations.  We will have Josh Handel at the next Meetup discussing faceted search in Azure Search.  He will also show us the new Elastic SQL.  In November we will have Paul Drew show us how to build environments using BoxStarter and other tools.

If you would like to present please confirm with me so that I can get you on the schedule!

Thank you goes to Clear Measure for sponsoring us.  But we are certainly up for making the group better with more sponsors.  If you know of anyone that is interested in sponsoring the AzureAustin group please send them my way. 

New Azurians (73 total):

Let’s get this number up!  I would love to get us to 90 or 100 users by the end of the year if possible.  Let’s start a competition.  If you refer a friend to the group please send me your name and their name.  I will enter you to win something fun in a drawing at the end of the year.  Perhaps a resharper license or similar?  I am open on the prize ideas.

Gil R Rosales

Azure info for the week:

Web Casts

- HATEOAS, REST, and Hypermedia Primer with Mat Velloso

- Azure Redis Cache 103 – Failover and Monitoring

- Machine learning, IoT: ThyssenKrupp uses predictive analytics to give burgeoning cities a lift

- Offline Sync with Donna Malayeri

Pod Casts

- Sharepoint apps on Azure

- Azure Podcast: Roundtable discussion on new Azure updates

- Evolution of CoreOS

- Building and managing scalable SaaS services

Posts

- Azure SQL Database point in time restore

- Azure web application gallery on azure.microsoft.com

- Introducing elastic scale preview for azure SQL Database

- New D-Series of Azure VMs with 60% faster cpus, more memory and local SSD disks

- Introducing DocumentDB

- How to guard your application from Azure outages

- Vowpal Wabbit Modules in AzureML

- Fastest way to spin-up your test lab environment in the cloud with Microsoft Azure

- SQL Server always on and ILB

- Chef and Orchestration

- Learn Chef – now with tracks for Windows Server and Ubuntu!

- Online learning ans dub-linear debugging

- Diagnosing issues with ModSecurity in Azure Websites

- Azure search new samples and videos

- ModSecurity web application firewall on Azure Websites

- Monitoring, diagnosing and troubleshooting Azure Storage

- Troubleshooting common configuration issues with Azure backup

Snippets/Tools

- Announcing the 1.0.0-rc1 of Microsoft Azure webjobs SDK

News

- Media in the Cloud Summit

- Microsoft offers price reductions on select Azure services

- Microsoft Azure announce as cloud computing winner at 2014 Tech Impact Awards

Free

- Scott Gu free ebook "Building cloud apps with azure"

- Microsoft Azure courses

Jobs

- If you like building cutting edge apps targeted for Auzre come apply at the coolest engineering shop in Austin - Clear Measure!  We are always looking for talented people.  http://www.clear-measure.com/careers/

- Others listed on Indeed!

Last Week’s Outages (status)

Man oh man have we had some outages in the past week or so.  Egad.  We have been helping folks that thought the simple fact of being in the cloud means auto high availability and scale.  Nope!

October 2014

10/2

Cloud Services \ Service Management - Southeast Asia - Partial Service Interruption

This incident has now been mitigated. From 01 Oct, 2014 12:30PM to 02 Oct.3:00AM UTC a small subset of customers in Southeast Asia using Cloud Services might have experienced issues while performing service management operations, such as creating new Cloud Services or updates existing services.

10/1

Notification Hub - worldwide advisory

We have confirmed that all disabled accounts are operational again and Notification Hub functionality has returned to normal for all impacted customers. From 00:00 UTC on 9/30/2014 through 23:59 UTC on 9/30/2014 a subset of customers who used Notification Hub may have seen incorrect billing resulting in disabling their subscriptions. Azure Billing Support will continue to work with all impacted customers to resolve their related billing discrepancies.

10/1

HDInsight - West US - Partial Performance Degradation

This incident has now been mitigated. From 6:30 UTC to 11:52 UTC on 9/30/2014 a subset of HDInsight customers in West US region may have experienced an inability to create clusters. Existing clusters should have been unaffected.

September 2014

9/30

Cloud Services and Virtual Machines - East US - Advisory (Limited Impact)

From 29 Sept, 2014 22:00 to 30 Sept 2014 11:30AM UTC a very small subset of customers in East US might have experienced issues with RDP operation to newly created Virtual Machines or Cloud Services. This incident has now been mitigated.

9/27

Service Management - Multiple Services - Southeast Asia - Partial Performance Degradation

From 27 Sep, at 11:30 to 15:20 UTC, customers may have experienced issues performing Service Management functions across Multiple Services deployed in Southeast Asia. Customers may have been unable to create new Azure Services, and may have faced issues updating, or deleting existing Azure Services. Availability of existing Azure Services in Southeast Asia was not impacted by this incident. Engineers have validated that full Service Management functionality has been restored to Southeast Asia. This incident is now mitigated.

9/27

Network Infrastructure - Central US - Partial Service Interruption

From 11:47 to 12:04 UTC on 27, September, 2014 a subset of customers using Virtual Machines, Cloud Services, Storage, SQL, or associated Service Management functions for services in the Central US region may have been unable to access their service resources. This incident has now been mitigated.

9/26

HDInsight - West Europe - Partial Performance Degradation

From 26 Sep, at 11:34 to 14:45 UTC, customers may have experienced intermittent failures when attempting to provision new HDInsight clusters in West Europe. Engineers have completed their mitigation steps and validated that this issue has been resolved. This incident is now mitigated.

9/26

RemoteApp - Multi-Region - Advisory

From 25, Sep 2014 11:15 UTC to 26, Sep 2014 00:26 UTC, A subset of customers in Multiple Regions may have experienced issues with their Remote App Data or Services not properly populating within the Remote App UI. Core Remote App functionality was not impacted by this incident. A subset of customers may have also had issues creating new Remote App Services or performing Service Management tasks during this incident. Engineers have validated that full functionality has returned for the Remote App Service. This incident is now mitigated.

9/25

Visual Studio Online - Multi-Region - Partial Service Interruption

From 25 Sep, 2014 14:15 to 14:34 UTC customers using Visual Studio Online in Multiple Regions may have experienced issues logging in to the VSO service. More information may be found at http://blogs.msdn.com/b/vsoservice/archive/2014/09/25/issues-with-vs-online-9-25-investigating.aspx. This incident has now been mitigated.

9/25

Multi-Factor Authentication - Multi-Region - Advisory

From 25, Sep 2014 at 16:54 UTC to 19:20 UTC, Customers using Multi-Factor Authentication in Multiple Regions may have experienced intermittent failures when attempting to authenticate with their mobile devices. A retry of the request would likely have resulted in a successful authentication. Engineers have verified that full Multi-Factor Authentication functionality has been returned to impacted mobile customers. This incident has now been mitigated.

9/25

Azure Redis Cache - Multi-Region - Advisory

From 25 Sep, 2014 14:00 to 16:30 UTC Azure Redis Cache customers may have experienced instances where their Redis Cache data did not properly populate in their Azure Management Portal. Core Azure Redis Cache availability was not affected by this incident. Engineers have validated that Redis Cache data is properly populating in Azure Management Portals and this incident has now been mitigated.

9/25

Service Bus - South Central US - Partial Service Interruption

From 03:10 UTC to 04:09 UTC on 25th September, 2014 a subset of customers using Service Bus in South Central US may have experienced server errors or time outs while accessing their resources. This incident has now been mitigated.

9/25

StorSimple - West US, North Europe and Japan East - Partial Service Interruption

From 03:10 UTC to 04:09 UTC on 25th September, 2014 a subset of customers using StorSimple in West US, North Europe and Japan East may have been unable to to restore customer VMs. This incident has been mitigated.

9/25

Websites - South Central US - Partial Service Interruption

From 03:10 UTC to 04:09 UTC on 25th September, 2014 a subset of customers with Websites in the South Central region may have may have experienced latency or error 503 when attempting to access their Websites. This incident has now been mitigated.

9/25

HDInsight - South Central US - Partial Service Interruption

From 03:10 UTC to 04:09 UTC on 25th September, 2014 a subset of customers using HDInsight may have been unable to create new HDInsight clusters. This incident has now been mitigated.

9/25

Visual Studio Online - Multi-Region - Partial Service Interruption

From 03:23 UTC to 04:19 UTC on 25th September, 2014 a subset of customers using Visual Studio Online in Multi-Region may have been unable to log in or access Visual Studio Online resources. This incident has now been mitigated.

9/25

Management Portal - Multiple Regions

This incident has now been mitigated. From 03:10 to 04:09 UTC on 25, September, 2014 a subset of customers using Management Portal in Multiple Regions may have experienced errors or time outs when attempting to access the Management Portal.

9/25

Cloud Services and Virtual Machines \ Service Management - South Central US - Partial Service Interruption

From 03:10 UTC to 04:09 UTC on 25 September, 2014 a subset of customers using Virtual Machines, Cloud Services and Virtual Machines \ Service Management in South Central US may have experienced inability to access service resources.. This incident has now been mitigated.

9/24

Media Services \ Encoding - East Asia - Partial Performance Degradation

From 22:13 UTC on 23 September to 01:40 UTC on 24 September, 2014 customers using Media Encoding Services may have experienced extended encoding times when processing their media jobs. This incident has now been mitigated.

9/22

ExpressRoute - Silicon Valley, Washington DC and London - Partial Performance Degradation

From 20 Sep, 2014 01:00 UTC to 22 Sep, 23:25 UTC, ExpressRoute customers in Silicon Valley, Washington DC, and London may have had difficulty accessing Storage or SQL endpoint resources via ExpressRoute. Other ExpressRoute services were not impacted by this incident. Engineering teams have validated their mitigation and full ExpressRoute Service for customers accessing Storage and SQL endpoints has been restored. This incident has now been mitigated.

9/22

Management Portal - Multi-Region - Advisory

From 22 Sep, 2014 11:06 to 15:00 UTC a subset of customers may have experienced intermittent issues logging in to the Management Portal, or seen Error 503 when logging in. A subset of customers may have also seen Error 503 if they refreshed their successfully logged-in instance of the Management Portal. In both cases, a retry of the login operation would likely have resulted in a successful connection. Engineers have validated that full Management Portal connectivity has been restored to the subset of impacted customers. This incident has now been mitigated.

9/22

Management Portal - Multi-Region - Partial Service Interruption

From 11:06 - 11:39 UTC on 22 September, 2014 a subset of customers using Management Portal in multiple regions may have experienced an error when attempting to log into their Management Portal. This incident has now been mitigated.

9/20

Network Infrastructure - Southeast Asia - Partial Service Interruption

From 20th Sep, 2014 at 2:17 AM to 3:35 AM UTC a subset of customers using Azure Services in Southeast Asia may have experienced connectivity issues due to an interruption to our Network Infrastructure. If you encountered any errors please retry your request. This incident has now been mitigated.

10/6/2014 7:49:25 PM

Another speaking engagement has been confirmed!  I am delivering a talk at this years InnoTech Austin.  The talk is entitled “A tale of two clouds – a side by side comparison of Microsoft Azure and AWS”.

pink-speaking-pig

Here is the talks abstract:

An Architectural feature comparison of the two most prevalent public clouds, Windows Azure and Amazon Web Services. This presentation will feature a side-by-side comparison of the features of Amazon Web services and Windows Azure covering categories such as:

• Deployment, Management and Automation
• Compute
• Storage
• Messaging
• Email
• Networking
• Security
• Operating System & Data Transfer
• Development Languages and Runtime Support

I will fill in more details after the talk is complete!

10/2/2014 5:13:51 PM

There have been a few outages in Azure land in the past month or so.  This of course has impacted some of their customer base and created the loud question “how do I ensure that their outages don’t bring my applications down?”.  The first step is to understand that simply having your application in a cloud does not make that application Highly Available.  HA is not a default behavior…it is a planned and configured behavior.  Let’s learn how to make an application highly available in Azure.

image

Availability sets, planned, and unplanned maintenance events

Planned: A planned maintenance event is one that is done to the underlying fabric of the Azure system.  This doesn’t always impact your applications.  But from time to time a restart of the VM may be required.  Generally these types of events will be communicated ahead of time so that you can react to them if need be.  But we will see that with proper planning around this notion you should never been impacted.

Unplanned: An unplanned even is when something happens to the physical hardware that your application is running on.  This could be a disk failure.  A server failure.  Etc.  In this case the Azure system will relocate your VM or web site to hardware that is not having issues and attempt to bring it back up.  If you only have one instance of your application running then some down time is to be expected.  However, we will see how even this sort of event can be planned for.

Availability sets:  You need to understand that an Availability set is the first way to guard against impact of planned and unplanned events.  From the planned event point of view – you should know that you can configure your application for redundancy by installing more than one instance of your application in the same availability set.  When you install VM1 and VM2 in the same Availability Set you get some immediate protection.  That is because each VM is automatically placed in what is called a Fault Domain (FD) and an Update Domain (UD).  The FD protects you from hardware, power, and switching failures.  And the UD protects you from rolling out updates where only one UD at a time will be taken down and brought back up.  There are five UD’s and two FD’s per availability set.  As you add a new VM it is added to the next FD and UD.  VM1 goes to UD1 and FD1.  VM2 goes to UD2 and FD2.  VM3 goes to UD3 and FD1.  Etc.  When you get to VM6 it rotates back to UD1.  And so on.  Neither the FDs or the UDs will protect you from operating system or application failures!

Understanding availability sets in your application: With this understanding you can see why it is important to put each tier of your application into its own availability set with at least two instances of your application per each availability set.  You would want your web tier in Availability Set 1.  Your cron jobs in Availability Set 2.  Your read layers denormalization service in Availability Set 3.  Etc.  This way you will always have at least one instance of each application concern available.

How to configure an availability set

Now that we understand what role an availability set plays for our application, let’s quickly see how to implement one.  Know that the configuration of an availability set is an extra few steps when creating your VM’s.

I started by creating my first VM from the quick create screen.

image

Then I will click on the name of the new VM to see the details for that VM instance.

image

Then click the configure tab.

image

Now you can set the availability set.  If this is the first time you have done this there won’t be any existing sets in the list.  So select create availability set from the drop down menu.

image

And enter the name of the availability set.  Then click save.

image

Once you click save, the VM will be taken down and reconfigured.  This can take a minute or so.  Then it will be started back up.

image

image

Once the VM comes back online you can see a new message in place stating that there is only one VM in the set.  And that this isn’t enough to ensure the SLA for the availability set.

image

Now we can create the next VM(s).  This can’t currently be done via the quick create as there is no way to associate the new VM’s with an existing cloud service.  And in order to share the availability set the new VM’s must be on the same cloud service.  For that reason use the create from gallery option to get into the details of your new VM’s creation.

Notice in the Virtual machine configuration screen that we can set the existing cloud service and pick the new availability set.

image

Now we can see in both VM configurations that they are attached to the same availability set.

image

For more on understanding availability sets take a look at the MS documentation: http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-manage-availability/

9/25/2014 2:57:54 PM

Normally I don’t mind when something automated PERIODICALLY fails due to the underpinnings of the infrastructural world.  However, lately I have been banging my head on all things Azure.  Last night scripts that have been running amazingly for a long while stopped working.  I received the error “ConflictError: Windows Azure is currently performing an operation on this hosted service that requires exclusive access.”

clip_image002

This started last night around 8pm.  It lasted until at least 8:30 this morning.  EGAD.  Some warning please. 

But again – no real problem.  The scripts I am running spin up entire environments, install a clients software, and then run a suite of tests against them. So I started them up again this morning.  Figured I would start four new environments just in case there were still some issues.  Good thing!  I had a 25% pass rate this morning.  Only one of the environments completed successfully.  Same script I have been using for a very long time…and only one out of four worked.

image

Time to spin up another four!

9/22/2014 3:23:05 PM

Last week was great fun. We got to see Andrew Siemer (me) present on the topic of Azure Search and DocumentDB. Next month we have a couple of people that have potentially raised their hands to speak…but are not yet entirely confirmed. If you are interested in sharing with the group let me know!  Here is the code and slides from last weeks presentation: https://github.com/asiemer/AzureAustin-20140917

Thank you goes to Clear Measure for stepping up to sponsor the event last week. If you know of anyone that is interested in sponsoring the AzureAustin group please send them my way. I would like to rotate the sponsorship of the event from time to time. Share the love!

New Azurians (72 total):

Merwant Chinta, Joe Daily, Rick Allen

Azure info for the week:

Web Casts

- Azure Redis Cache 101 – Introduction to Redis

- Azure Redis Cache 102 – Application Patterns

Pod Casts

- Episode 44 – Docker & Kubernetes

Posts

- Microsoft Machine Learning Hackathon 2014

- Extensibility and R Support in the Azure ML Platform

- Microsoft brings Hadoop to China with the Public Preview of HDInsight

- Azure Media Services RTMP Support and Live Encoders

- Enabling CDN for Azure Websites

- Azure Automation: Controlling Runbook Streams for Testing and Troubleshooting

- Azure Media Services now supported by JW Player

- Scale is never a problem with Azure Websites

- Azure Websites Virtual Network Integration

- Getting Started with Azure Management Libraries for Java

Snippets/Tools

- Not entirely AZURE…but nice: Jump-Location – A Change Directory (CD) PowerShell command that reads your mind

Free

- Scott Gu free ebook "Building cloud apps with azure"

- Microsoft Azure courses

Jobs

- If you like building cutting edge apps targeted for Auzre come apply at the coolest engineering shop in Austin - Clear Measure!  We are always looking for talented people.  http://www.clear-measure.com/careers/

- Others listed on Indeed!

Last Week’s Outages (status)

9/22 Management Portal - Multi-Region - Advisory

From 22 Sep, 2014 11:06 to 15:00 UTC a subset of customers may have experienced intermittent issues logging in to the Management Portal, or seen Error 503 when logging in. A subset of customers may have also seen Error 503 if they refreshed their successfully logged-in instance of the Management Portal. In both cases, a retry of the login operation would likely have resulted in a successful connection. Engineers have validated that full Management Portal connectivity has been restored to the subset of impacted customers. This incident has now been mitigated.

9/22 Management Portal - Multi-Region - Partial Service Interruption

From 11:06 - 11:39 UTC on 22 September, 2014 a subset of customers using Management Portal in multiple regions may have experienced an error when attempting to log into their Management Portal. This incident has now been mitigated.

9/20 Network Infrastructure - Southeast Asia - Partial Service Interruption

From 20th Sep, 2014 at 2:17 AM to 3:35 AM UTC a subset of customers using Azure Services in Southeast Asia may have experienced connectivity issues due to an interruption to our Network Infrastructure. If you encountered any errors please retry your request. This incident has now been mitigated.

9/17 Network Infrastructure - West Europe, North Central US and South Central US

This incident has now been mitigated. As of 17 Sept, 2014 14:41 to 18:18 UTC we experienced interruption to portions of the Network Infrastructure in North Central US, South Central US and West Europe. Some customers in the affected regions might experience errors while accessing Azure resources, such as Storage, SQL Database, Cache Services, Websites, Azure Preview portal, API Management. Preliminary investigations indicate that a portion of network infrastructure which routes network traffic was in a degraded state, and engineers implemented a configuration change to mitigate the issue.

9/16 Cloud Services and Virtual Machines (Service Management) - North Europe - Advisory

This incident is now resolved. From 16th Sep, 2014 at approximately 7:15 AM to 10:15 AM UTC a limited subset of customers using Cloud Services or Virtual Machines in North Europe may have encountered delays when attempting to provision new services. Any requests submitted during this time frame have now processed correctly. Existing services were unaffected by this incident.

9/17/2014 1:25:41 PM

I read an interesting post from Scott Hanselman the other day pertaining to microphones and remote workers. Oddly enough we have been battling an issue in the office over the past couple of weeks where a person’s microphone would be awesome at the start of a meeting…but then slowly degrade over time.  Not degrade in quality, just that the volume would slowly get lower and lowers.  So he would be speaking and it was as if he just started to walk away from the conversation.  As we use google voice, skype, and gotomeeting, there were three apps (we thought) potentially cancelling one another out.  We looked and looked through the app settings for the culprit and finally just decided to look for an all app solution.  As it turns out there is an easy one!

To permanently solve this issue for all applications start by going to the volume control in the lower left.  Right click.  Choose recording devices.

image

Select the device you are using for your microphone.  Click properties.

image

Then choose the advanced tab.

image

Uncheck the “Allow applications to take exclusive control of this device”.

image

Click OK.  Good to go!

9/16/2014 10:17:00 PM

I have a talk coming up at AzureAustin (tomorrow actually).  It is about Azure’s new Document DB and Search services.  Both of these are data intensive.  Using crap data in those environments won’t quite give the full impact that might be needed for such a presentation.  Thanks to Jeep – I love JEEPS – I now have a fun set of data by simply spidering through some data on their end and making the generator just a touch smarter.

image

Obviously this problem is not solved by just displaying fancy but subtly different images.  The data that goes around the image needs to match too.  In this case I have an image naming scheme that takes other parts of the data from which the image name is constructed.  This way the data in the package matches the image it generates too.

image