Thursday, September 28, 2017

Back up your Dynamics GP SQL Server databases directly to Azure Storage in minutes!

My blog has moved!  Please visit the new blog at:  https://blog.steveendow.com/

I will no longer be posting to Dynamics GP Land, and all new posts will be at https://blog.steveendow.com

Thanks!



By Steve Endow

At the GP Tech Conference 2017 in lovely Fargo, ND, Windi Epperson from Advanced Integrators had a great session about Disaster Recovery. One topic she discussed was the ability to use the Dynamics GP Back Up Company feature to save SQL backups directly to Azure.


I think doing SQL backups to Azure is a great idea. There are countless tales of SQL backups not being done properly or being lost or not being retained, and having an option to send an occasional SQL backup to Azure is great.

But this option is a manual process from the Dynamics GP client application, it is not scheduled, and it does not use the "Copy-only backup" option, so the backups will be part of the SQL backup chain if the customer also has a scheduled SQL backup job.  So as Windi explained, it may be a great option for very small customers who can reliably complete the task manually on a regular basis.

But how about setting up a backup job in SQL Server that will occasionally send a backup to Azure?

It turns out that the process is remarkably easy and takes just a few minutes to setup and run your first backup to Azure Storage.

NOTE: From what I can tell, SQL backups to Azure are supported in SQL 2012 SP1 CU2 or later.  And it appears that the backup command syntax may be slightly different for SQL 2012 and 2014, versus a newer syntax for SQL 2016.


The hardest part is setting up your Azure account and creating the appropriate Azure Storage account.  It took me a few tries to find the correct settings.

First, you have to have an Azure account, which I won't cover here, but it should be a pretty simple process.  Here is the sign up page to get started:  https://azure.microsoft.com/en-us/free/

Once you have your Azure account setup and have logged in to the Azure Portal (https://portal.azure.com), click on the "More Services" option at the bottom of the services list on the left.  In the search box, type "storage" and a few options should be displayed.

I chose the newer "Storage Accounts" option (not "classic").  To pin this to your services list, click the star to the right.


Tuesday, September 26, 2017

Free SFTP file transfer and data export tool for Dynamics GP file-based integrations

By Steve Endow

A somewhat common requirement for file-based integrations between Dynamics GP and external services or SaaS solutions involves uploading or downloading files from an SFTP server (SSH File Transfer, completely different than the similarly named FTP or FTPS).  SFTP has some technical quirks, so it is often a hassle for customers to automate SFTP file transfers as part of their Dynamics GP integrations.

Some of those integrations also involve exporting data from Dynamics GP to a CSV file and uploading that data to an SFTP server.

To handle this task, I have developed an application that can export data from GP, save it to a CSV file, and upload it to an SFTP server.  It can also download files from an SFTP server.  The tool is fully automated, can be scheduled using Windows Task Scheduler, and it includes file archiving, logging, and email notification in case of errors.

If you use Blackline, Coupa, IQ BackOffice, or any other provider or service that requires upload or download of files with an SFTP server, this tool may be helpful.  It can be used in place of WinSCP or similar tools that require command line scripting.

I am offering this tool for free to the Dynamics GP community.  It can be downloaded from my web site at:

https://precipioservices.com/sftp/

The download includes a user guide and sample configuration file.  There are quite a few configuration settings, so please make sure to review the documentation to understand how the settings are used.

If you end up using the Precipio SFTP tool, I would be love to hear about which system or service you are using it with and how it ends up working for you.

I started a thread on the GPUG Open Forum if you want to discuss the SFTP tool:

https://www.gpug.com/communities/community-home/digestviewer/viewthread?GroupId=247&MessageKey=d6f5ce8b-1fdd-4fb1-abcc-9e7e529ce013&CommunityKey=4754a624-39c5-4458-8105-02b65a7e929e&tab=digestviewer


If you have questions or encounter issues, you can contact me through my web site at:

https://precipioservices.com/contact-us/



You can also find him on Twitter, YouTube, and Google+









Wednesday, September 20, 2017

The 10th and 11th ways you can lose your SQL data...

By Steve Endow

Brent Ozar has an excellent post where he shares 9 stories about how customers lost some or all of their SQL Server data.

https://www.brentozar.com/archive/2015/02/9-ways-to-lose-your-data/


What's great about his stories is that as I read each one, I thought "Yep, I can totally see that happening."  A simple oversight, a small mistake, one person making a change without realizing it affected other systems, or simply forgetting to change back a single setting in SQL Server.  The one about invalid SQL SMTP settings preventing error emails from going out reminded me of my recent Synology drive failures, as I also had invalid SMTP settings and hadn't received the hundreds of error emails telling me I had a problem--so I am certain that is a common symptom.

While stories about hurricanes, floods, tornadoes, or fires may provide great drama for discussion about disaster recovery, I suspect that there are far more disasters that are caused by a few clicks of a mouse, followed by "Ooops." (or "OH MY GOD WHAT HAVE I DONE???")

I have two data loss stories of my own to add to the SQL data loss lore.


Pulling the Wrong Drive

Many years ago, I was a "business systems consultant" for a Big 6 (at the time) consulting firm and somehow ended up helping a customer with their Solomon IV implementation after their sole IT employee quit.  I knew Solomon IV, knew VB, knew SQL, and knew hardware, so I was juggling everything and helping them finish their implementation.

Their Hewlett Packard server that hosted the Solomon IV databases was having some issues with its RAID array.  The server had mirrored drives that hosted the database files, and occasionally that mirror would 'break' for no good reason.  Windows would mark one drive as inactive, and the server would run on one of the drives until we removed the inactivated drive, reinserted it, and repaired the array.  This had happened once or twice before, and I was on site at the customer when it happened again.  I checked Windows, checked the array, confirmed the mirror had broken.  I then pulled the drive, reinserted the drive, and then started the array rebuild.  No problem.

Shortly after that, a user noticed that a transaction they entered that morning was no longer available in Solomon.  Then another user.  Then another.  We eventually discovered that all of the transactions and data that had been entered that day were gone.  What happened?

After pondering for a while, I realized what I had done.  When the RAID mirror broke, Windows would say that one drive had been inactivated, but it wasn't always clear which drive had been inactivated.  You had to poke around to figure out if it was the drive on the left or the drive on the right--I don't remember the process, and it might have even been as high tech as watching to see which blinky light on one of the drives wasn't blinking.

I had either mis-read the drive info or not looked carefully enough, and I had pulled out the wrong drive.  The active drive.  The one that was working and had been saving the transactions and data that day.  After I reinserted the drive, I then chose the 'bad' drive, the one that hadn't been active at all that day, marked it as the primary, and then rebuilt the mirror with the old data from that drive.  Thereby losing the data that had been entered that day.

This was pre-SQL Server, so we didn't have transaction log backups, so even if we had a full back up from the prior evening, it wouldn't have helped, as it was only that day's data that was lost.  Fortunately, I think it was only mid-day, so the users only lost the data from that morning and were able to reconstruct the transactions from paper, email, and memory.

Ever since I made that mistake, I am extremely paranoid about which physical drive is mapped to RAID arrays or Windows drive letters.  If you've built a PC or server in the last several years, you may know that Windows will assign drive letters semi-randomly to SATA drives.  And when I had two bad drives in my Synology, I double and triple checked that the drive numbers provided by the Synology did in fact map to the physical drives in the unit, from left to right.

I'm hoping that I never pull the wrong drive again.


Test vs. Production

In Brent's blog post, he shared a story about someone logging into the wrong server--they thought they had logged into a test environment, but were actually dropping databases in production.

I have a similar story, but it was much more subtle, and fortunately it had a happier ending.

I was testing a Dynamics GP Accounts Payable integration script.  I must have been testing importing AP invoices, and I had a script to delete all AP transactions from the test database and reload sample data.  So I'm running my scripts and doing my integration testing, and a user calls me to tell me that they can't find an AP transaction.  We then start looking, and the user tells me that transactions are disappearing.  What?

As we were talking, all of the posted AP transactions disappeared.  All AP history was gone.

Well, that's weird, I thought.

And then it hit me.  My script.  That deletes AP transactions.  That I ran on the Test database.

But how?

Somehow, I apparently ran that script against the production company database.  I was probably flipping between windows in SQL Management Studio and ended up with the wrong database selected in the UI.  And the customer had so much AP data that it took several minutes to delete it all, as I was talking to the user, and as we watched the data disappear.

You know that gut wrenching feeling of terror when your stomach feels like it's dropped out of your body?  Followed by sweat beading on your brow?  That's pretty much how I felt once I guessed that I had probably accidentally run my Test Delete script on the production database.  Terror.

In a mad scramble that amazes me to this day, I somehow kept my sanity, figured out what happened, and came up with an insane plan to restore the AP data.  Fortunately, the customer had good SQL backups and had SQL transaction logs.  For some reason, I didn't consider a full database restore--I don't recall why--perhaps it was because it would require all users to stop their work and we would have lost some sales data.  So I instead came up with the crazy idea of reading the activity in the SQL log files.  Like I said, insane.

So I found an application called SQL Log Rescue by RedGate Software that allowed me to view the raw activity in SQL Server log files.  I was able to open the latest log file, read all of the activity, see my fateful script that deleted all of the data.  I was also able to view the full data of the records that were deleted and generate SQL scripts that would re-insert the deleted data.  Miraculously, that crazy plan worked, and SQL Log Rescue saved me.  I was able to insert all of the data back into the Accounts Payables tables, and then restart my heart.

Thinking back on it, I suspect that the more proper approach would have been do to a SQL transaction log backup and then perform a proper point in time recovery of the entire database.  Or I could have restored to a new database and then copied the data from the restore into production.  But as Brent's stories also demonstrate, we don't always think clearly when working through a problem.


So when you're planning your backup routines and disaster recovery scenarios, review the stores that Brent shares and see if your backup plans would handle each of them.  And then revisit them again occasionally to make sure the backups are working and you are still able to handle those scenarios.


Steve Endow is a Microsoft MVP in Los Angeles.  He is the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.


You can also find him on Twitter, YouTube, and Google+



Tuesday, September 19, 2017

eConnect error: The target principal name is incorrect. Cannot generate SSPI context.

By Steve Endow

A customer recently encountered this error with a Dynamics GP eConnect integration:


The target principal name is incorrect. Cannot generate SSPI context.

Just before this error was reported, a new version of a custom Dynamics GP AddIn had been deployed, so I got the support call, as the partner and customer thought the error was released to the new AddIn.

But this error is related to the eConnect user authentication with SQL Server, so deploying a new DLL shouldn't have affected that authentication.

I recommended that the customer's IT team check the status of the eConnect windows service on the terminal server and try restarting it.  The eConnect service was running, but when they restarted the service, they received a login error.

It seems that some other process on the client's network was attempting to use the Active Directory account assigned to the eConnect service on the terminal server.  That other process, whatever it is, apparently has an invalid or old password for the domain account.  So it was failing to login and locking the Active Directory account.

Once the account was locked, the eConnect service on the terminal server would begin receiving the SSPI context errors, as its authentication with SQL Server would fail once the account was locked.

The IT team had previously tried to reset the eConnect account password, but it would just get locked out again by the mystery app or process that was still trying to use the same domain account.  So I recommended that they create a new dedicated domain account for use by the eConnect windows service on the terminal server.

Once they setup the new domain account and updated the eConnect windows service to use the new account, the problem went away.

However, this morning the error seemed to occur again, but restarting the eConnect service appears to have resolved it.  Given this odd recurrence, there may be some other cause or details that may be contributing to the problem.

Steve Endow is a Microsoft MVP in Los Angeles.  He is the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.

You can also find him on Twitter, YouTube, and Google+




Friday, September 8, 2017

Multiple hard drive failures on a Synology NAS: Lessons Learned

By Steve Endow

This is a long post, but I think the context and the entire story help paint a picture of how things can fail in unexpected and odd ways, and how storage failures can be more complicated to deal with than you might expect.  I learned several lessons so far, and I'm still in the middle of it, so I may learn more as things unfold.

On Tuesday evening, I received several emails from my backup software telling me that backup jobs had failed.  Some were from Veeam, my absolute favorite backup software, saying that my Hyper-V backups had failed.  Others were from Acronis True Image, saying that my workstation backup had failed.

Hmmm.


Based on the errors, it looks like both backup apps were unable to access my Synology NAS, where their backup files are stored.

That's odd.

When I tried to access the UNC path for my Synology on my Windows desktop, I got an error that the device could not be found.  Strange.

I then opened a web browser to login to the Synology.  But the login page wouldn't load.  I then checked to make sure the Synology was turned on.  Yup, the lights were on.

After several refreshes and a long delay, the login page eventually loaded, but I couldn't login.  I then tried connecting over SSH using Putty.  I was able to connect, but it was VERY slow.  Like 30 seconds to get a login prompt, 30 seconds to respond after entering my username, etc.  I was eventually able to login, so I tried these commands to try and reboot the Synology via the SSH terminal.

After issuing the command for a reboot, the power light started blinking, but the unit didn't shutdown.  Strangely, after issuing the shutdown command, I was able to login to the web interface, but it was very slow and wasn't displaying properly.  I eventually had to hold the power button down for 10 seconds to hard reset the Synology, and then turned it back on.

After it rebooted, it seemed fine.  I was able to browse the shares and access the web interface.  Weird.

As a precaution, I submitted a support case with Synology asking them how I should handle this situation in the future and what might be causing it.  I didn't think it was a big deal.

On Wednesday evening, I got the same error emails from my backup software.  The backups had failed.  Again.  Once again, the Synology was unresponsive, so I went through the same process, and eventually had to hard reset it to login and get it working again.

So at this point, it seemed pretty clear there is a real problem.  But it was late and I was tired, so I left it and would look into it in the morning.

On Thursday morning, the Synology was again unresponsive.  Fortunately, I received a response from Synology support and sent them a debug log that they had requested.  Within 30 minutes I received a reply, informing me that the likely issue was a bad disk.

Apparently the bad disk was causing the Synology to deal with read errors, and that was actually causing the Synology OS kernel to become unstable, or "kernel panic".


This news offered me two surprises.  First, I was surprised to learn that I had a bad disk.  Why hadn't I known that or noticed that?

Second, I was surprised to learn that a bad disk can make the Synology unstable.  I had assumed that a drive failure would be detected and the drive would be taken offline, or some equivalent.  I would not have guessed that a drive could fail in a way that would make the NAS effectively unusable.

After reviewing the logs, I found out why I didn't know I had a bad drive.


The log was filled with hundreds of errors, "Failed to send email".  Apparently the SMTP authentication had stopped working months ago, and I never noticed.  I get so much email that I never noticed the lack of email from the Synology.

The drive apparently started to have problems back in July, but up until this week, the Synology seemed to still work, so I had no reason to suspect a problem.

Synology support also informed me that the unit was running a "parity consistency check" to try and verify the data on all of the drives.  This process normally slows the unit down, and the bad drive makes the process painfully slow.

After a day and a half, the process is only 20% complete, so this is apparently going to take 4-5 more days.


So that's great and all, but if I know I have a bad drive, can't I just replace the drive now and get on with the recovery process?  Unfortunately, no.  Synology support said that I should wait for the parity consistency check to complete before pulling the bad drive, as the process is "making certain you are not suffering data/ volume corruption so you can later repair your volume with no issues."

Lovely.  So waiting for this process to complete is preventing me from replacing the bad drive that is causing the process to run so slowly.  And I'm going to have to wait for nearly a week to replace the drive, all the while hoping that the drive doesn't die completely.

I'm sensing that this process is less than ideal.  It's certainly much messier than what I would have expected from a RAID array drive failure.

But that's not all!  Nosiree!


In addition to informing me that I have a bad drive that is causing the Synology to become unusable, it turns out that I have a second drive that is starting to fail in a different manner.


Notice that Disk 6 has a Warning status?  That's actually the second bad drive.  The first bad drive is Disk 2, which shows a nice happy green "Normal" status.

After reviewing my debug log, Synology support warned me that Disk 6 is accumulating bad sectors.

Sure enough, 61 bad sectors.  Not huge, but a sign that there is a problem and it should probably be replaced.


Lovely.

So why did I not know about this problem?  Well, even if SMTP had been working properly on my Synology, it turns out that the bad sector warnings are not enabled by default on the Synology.  So you can have a disk failing and stacking up bad sectors, but you'd never know it.  So that was yet another thing I learned, and I have now enabled that warning.

Correction 1: I remembered that the monthly disk health report shows bad sectors, so if you have that report enabled, and if your SMTP email is working, you will see the bad sector count--assuming you review that email.

Correction 2: A reader noted that new Synology units or new DSM installs apparently do have the Bad Sector Warning notification enabled by default, and set with a default of 50 bad sectors as the threshold.  But if you have an existing / older Synology, it likely does not have the Bad Sector Warning enabled.


So, here's where I'm at.

I've fixed the email settings so that I am now getting email notifications.

I'm 20% into the parity consistency check, and will have to wait 5+ more days for that to finish.

As soon as I learned that I had 2 bad drives on Thursday morning, I ordered two replacement drives.  I paid $50 for overnight express shipment with morning delivery.  Because I wanted to replace the drives right away, right?  But that was before Synology emphasized that I should wait for the parity check to complete.  So those drives are going to sit in the box for a week--unless a drive dies completely in the meantime.

If the parity check does complete successfully, I'll be able to replace Drive 2, which is the one with the serious problems.  I'll then have to wait for the Synology to rebuild the array and populate that drive.

Once that is done, I'll be able to replace Drive 6, and wait for it to rebuild.

Great, all done, right?

Nope.  I'll need to hook up the two bad drives and run the manufacturer diagnostics and hopefully get clear evidence of an issue that allows me to RMA the drives.  Because I will want the extra drives.  If I can't get an RMA, I'll be buying at least 1 new drive.

This experience has made me think differently about NAS units.  My Synology has 8 drive bays, and I have 6 drives in it.  The Synology supports hot spare drives, so I will be using the additional drives to fill the other two bays and have at least one hot spare available, and most likely 2 hot spares.

Previously, I didn't think much of hot spares.  If a drive fails, RAID lets you limp along until you replace the bad drive right?  In concept.  But as I have experienced, a "drive failure" isn't always a nice clean drive death.  And this is the first time I've seen two drives in the same RAID array have issues.

And it's also shown me that when drives have issues, but don't fail outright, they can make the NAS virtually unusable for days.  I had never considered this scenario.  While I'm waiting to fix my main NAS, my local backups won't work.  And this Synology is also backing up its data to Backblaze B2 for my offsite backup.  That backup is also disabled while the parity check runs.  And I then have another on-site backup to a second Synology unit using HyperBackup.  Again, that backup is not working either.  So my second and third level backups are not available until I get my main unit fixed.

Do I redirect my backup software to save to my second Synology?  Will that mess up my backup history and backup chains?  I don't know.  I'll have to see if I can add secondary backup repositories to Veeam and Acronis and perhaps merge them later.

Another change I'll be making is to backup more data to my Backblaze B2 account.  I realized that I was only backing up some of the data from my main Synology to B2.  I'll now be backing up nearly everything to B2.

So this has all been much messier than I would have imagined.  Fortunately it hasn't been catastrophic, at least not yet.  Hopefully I can replace the drives and everything will be fine, but the process has made me realize that it's really difficult to anticipate the complications from storage failures.

Update:  It's now Monday morning (9/11/2017), 5 full days after the Synology was last rebooted and the parity consistency check was started, and it's only at 31%.  I did copy some files off of this unit to my backup Synology, which seems to pause or stop the parity check, but at this speed, it's going to take weeks to finish.  This long parity processing does seem to be a result of the bad Drive 2, as the parity consistency check recently ran on my other Synology in under a day.



Update 2: Tuesday morning, 9/12/2017.  The parity consistency check is at 33.4%.  Painfully slow.  My interpretation is that any task, job, process, or file operation on the Synology seems to pause or delay the parity consistency check.  I have now disabled all HyperBackup jobs, paused CloudSync, and stopped my Veeam backup jobs to minimize activity on the limping Synology.  I may turn off file sharing as well, just to ensure that network file access isn't interfering with the parity check process. 

I also just noticed that the File Services settings on the Synology show that SMB is not enabled. My understanding is that this is required for Windows file sharing, so I'm puzzled how I'm still able to access the Synology shares from my desktop.  I didn't turn it off, so I'm not sure if this is due to the Synology being in a confused state due to the drive issues, or something else.  I find it strange that my backup software is unable to access the Synology shares, but I'm able to eventually access them--although they are very slow to come up.


Update 3:  Monday, 9/18/2017 - The Saga Continues:  After thinking about it, I realized that the parity consistency check was likely triggered because I powered off the Synology before it shut down on its own.  At the time, I thought that the unit was hung or unresponsive, but I now realize that it was the bad disk that was causing the shutdown to take forever.  The parity check is estimated to take 2-4 years due to the bad drive, so I just shut the unit down to replace the bad drive.  It took 30-60 minutes for it to fully power down.  If you encounter an issue with a drive that causes the Synology to be slow or seem unresponsive, avoid doing a hard reset or hard shutdown on the unit.  Try the shutdown command and wait an hour or two to see if the shutdown eventually completes on its own.  This will hopefully allow you to avoid a parity consistency check, which is a major hassle with a bad drive.

Now that I've replaced the drive and powered the Synology on, the parity consistency check is still running, and I'm unable to add the replacement disk to my volume.  I've replied to Synology support on my existing case asking them how to cancel the parity consistency check and just add the replacement drive so that it can get started on the volume repair process.


Update 4:  9/18/2017: After replacing the bad drive, I see that the parity consistency check is running much faster and I may not have to cancel it.  With the bad drive, the process was estimated to take 2-4 years (yes YEARS) to complete, but with the new drive, it is currently estimating about 16 hours.  I'm going to let it run overnight and see how much progress it has made by tomorrow morning.


Update 5:  9/19/2017: The parity consistency check finally completed and the Synology began to beep every few seconds, indicating that the volume was "degraded".



Since the parity check was no longer running, the "Manage" button became active, and I was able to add the new drive to the volume and start the repair process, which was quite simple.




So the repair process is now running and it looks like it will take about 26 hours to complete.



Update 6:  9/20/2017:  The repair process appears to be going well and should complete today.


While the repair is running, I plugged the bad drive into my desktop and ran the HGST "DFT for Windows" diagnostic application to test the drive.  Interestingly, it is not detecting any problems.  On the extended tests, it appears to be hanging, but it isn't identifying a problem.


Final update: 9/22/2017:  I replaced the second bad drive and the Synology has repaired the volume.  Things are back to normal and working well.


I created RMAs for both of the HGST hard drives and mailed them back, so I should get replacements for those drives, which I'll install in the Synology as hot spares.



Steve Endow is a Microsoft MVP in Los Angeles.  He is the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.

You can also find him on Twitter, YouTube, and Google+