Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.

If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!

BuildMaster TCP/Self-Hosted Agent Stability?



  • Hello,

    I just wanted to report a possible bug with the build agent’s stability. I don’t know what upgrade seemed to have introduced this, but our build agents randomly terminate connections. It could be in the middle of a build and the connection just closes. It appears to happen at random times, even after successfully connecting and already having executed a previous build step. The only fix I know is to login to the affected target server and restart the build agent service. Sometimes retrying the last failed step works, but most of the time it doesn’t, so I’ve trained myself to simply login and restart the agent service and everything will work for a while. I believe this happened a couple versions ago—sorry I don’t have better specifics. I figured it was temporary, but it just hasn’t gone away. I’d be happy to provide additional details if requested.

    Our Environment:

    • The build agents are running on both Windows Server 2012 and 2012 R2
    • The build controller is running on Windows Server 2012
    • The servers are not on the same network (the build agents have ports accessible via the Internet, * the build controller communicates via those public ports)
    • The build agents utilize the “Agent.SecurityToken” app setting (we were previously using SSL also, * but around the same time this became a problem, SSL seems to have stopped working)
    • All build agents are configured as self-hosted TCP agents

    Sample Error:

    Logged	3/13/2014 3:42:07 PM
    Message	[Exec #258] Unhandled exception: System.InvalidOperationException: The connection was closed unexpectedly.
    at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgentClient.SendHandshake(String securityToken)
    at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgentClientPool.CreateClient(Endpoint endpoint)
    at Inedo.BuildMaster.Extensibility.Agents.ClientConnectionPool`2.AcquireConnection(TEndpoint endpoint)
    at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgentClientPool.GetClient(String hostName, Int32 port, String securityToken, Boolean ssl, Boolean ignoreCertificateErrors)
    at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgent.GetClient()
    at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgent.Inedo.BuildMaster.Extensibility.Agents.IRemoteMethodExecuter.InvokeMethod(MethodBase method, Object instance, Object[] parameters)
    at Inedo.BuildMaster.Extensibility.Agents.AgentExtensions.InvokeFunc[TResult](IRemoteMethodExecuter agent, Func`1 method)
    at System.Lazy`1.CreateValue()
    at System.Lazy`1.LazyInitValue()
    at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgent.<.ctor>b__2()
    at Inedo.BuildMaster.Extensibility.Agents.RemoteFileOperationsExecuter.Inedo.BuildMaster.Extensibility.Agents.IFileOperationsExecuter.GetBaseWorkingDirectory()
    at Inedo.BuildMaster.Windows.ServiceApplication.PlanExecuter.AgentBasedActionExecuter.InitializeRemoteConfiguration()
    at Inedo.BuildMaster.Windows.ServiceApplication.PlanExecuter.AgentBasedActionExecuter.Initialize()
    at Inedo.BuildMaster.Windows.ServiceApplication.PlanExecuter.ExecutingPlan.ExecutePlan()
    Stack Trace	at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
    at System.Environment.get_StackTrace()
    at Inedo.BuildMaster.Diagnostics.DatabaseErrorMessenger.Inedo.Diagnostics.IMessenger.Message(IMessage message)
    at Inedo.Diagnostics.Logger.Message(MessageLevel messageLevel, String message)
    at Inedo.BuildMaster.Windows.ServiceApplication.PlanExecuter.ExecutingPlan.ExecutePlan()
    at System.Threading.Tasks.Task.Execute()
    at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
    at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
    at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
    at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
    at System.Threading.ThreadHelper.ThreadStart(Object obj)
    

    Product: BuildMaster
    Version: 4.1.5



  • Errors like this are indicative of network-level problems, and can be reproduced by doing things like disabling network adapters, unplugging cables, etc. The most common cause that we’ve seen relates to buggy network device firmware (QoS, NAT, etc). Hopefully your network folks can use some trace tools to see where things are problematic and why.

    The self-hosted agent does not have any additional network fault tolerance beyond what’s in TCP/IP, so unexpected network failures like this will crash the agent process. You can set the agent service to “auto restart” if it isn’t already, but beyond that there’s not much that can be done.

    If you can’t resolve the network problems, I’d consider switching to the IIS-hosted agent. IIS does have additional fault tolerance and can handle bad networks a little better.



  • For what it's worth, the build agents are running on Windows Azure VMs, hosting various applications. None of these other applications experience network issues; hence, my belief there may be something related to the build agent. I'm not saying there is definitely an issue with the build agent, it just seems to be so. Furthermore, our monitoring software also indicates general network reliability and availability.

    The specific VMs in question run fine, but the build controller randomly (or seemingly so) generates the error messages (below) throughout the day, when there are no build activities. Each day I have to clear out upwards of 5-15 of these errors:

    Logged 3/14/2014 11:08:54 AM Message Error scanning agent for DEVPM3 (18) Stack Trace at System.Net.Sockets.TcpClient..ctor(String hostname, Int32 port) at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgentClientPool.CreateClient(Endpoint endpoint) at Inedo.BuildMaster.Extensibility.Agents.ClientConnectionPool'2.AcquireConnection(TEndpoint endpoint) at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgentClientPool.GetClient(String hostName, Int32 port, String securityToken, Boolean ssl, Boolean ignoreCertificateErrors) at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgent.GetClient() at Inedo.BuildMaster.Extensibility.Agents.Tcp.TcpAgent.GetAgentStatusInternal(IHostedAgentContext context) at Inedo.BuildMaster.Windows.ServiceApplication.AgentUpdater.CheckServer(Servers server)



  • Ah, thanks for the clarification, I had assumed all were in-house servers.

    If these are agents are accessed over the internet to a public cloud like Azure, then this behavior is not surprising. As I mentioned, the agents do not have any error correction beyond what's in TCP/IP.... and there's a lot of interfrasticity between your BuildMaster server and an AzureVM.

    You can change the Agent Update Throttle in All Settings to reduce error count, or switch to the IIS-hosted agent, which may prove more reliable.



  • I'll try changing the type to IIS-hosted and see what comes of it. Just seems a little strange, since we've been running with the same configuration (TCP-hosted) since BM 4.0.5, without any issues. At some point (I believe within the last month), all build agents began failing at random times. Whatever version it was, around the same time we were no longer able to use SSL on the agents and we haven't tried using it since.



Inedo Website HomeSupport HomeCode of ConductForums GuideDocumentation