The SAN Guy

tips and tricks from an IT veteran…

Celerra / VNX File replication job creation errors when selecting destination

I recently had an issue with setting up a new Celerra file system replication job. As soon as I selected the destination system I received the three errors below.

1) Query VDMs All. Cannot access any Data Mover on the remote system, hostname

Severity: Error
Brief Description: Cannot access any Data Mover on the remote system, hostname
Full Description: No Data Movers are available on the specified remote system to perform this operation
Recommended Action: 1) Check if the Data Movers on the specified remote system are accessible. 2) Ensure that the difference in system times between the local and remote Celerra systems or VNX systems does not exceed 10 minutes. Use NTP on the Control Stations to synchronize the system clocks. 3) Ensure that the passphrase on the local Control Station matches with the passphrase on the remote Control Station. 4) Ensure that the same local users that manage VNX for file systems exist on both the source and the destination Control Station. 5) Ensure that the global account is mapped to the same local account on both local and remote VNX Control Stations. Primus emc263860 provides more details.
Message ID: 13690601568

2) Query storage pools All. Execution failed: Segmentation fault: Operating system signal. [STRING.make_from_string]

Severity: Error
Brief Description: Execution failed: Segmentation fault: Operating system signal. [STRING.make_from_string]
Full Description: Operation failed for the reason described in the accompanying message.
Recommended Action: Correct the cause of the problem and repeat the operation.
Message ID: 13421840573

3) There are no destination pools available.

Severity: Info
Brief Description: There are no destination pools available.
Full Description: Destination side storage pools are not available.
Recommended Action: Check whether the storage pools have enough space.
Message ID: 26845970450

I was unable to determine the cause of the problem so I opened an SR with EMC.

It turns out there was a user discrepancy between the /etc/passwd file and the /nas/site/user_db file. This was causing the following error when checking the interconnect:

[nasadmin@celerra02 log]$ nas_cel -interconnect -l
Error 2237: Execution failed: Segmentation fault: Operating system signal. [STRING.make_from_string]

The output should look something like this:

[nasadmin@celerra02 log]$ nas_cel -interconnect -l
id    name          source_server destination_system destination_server
20001 loopback      server_2      DRSITE1            server_2
20003 SITE1VNX5500  server_2      DRSITE1            server_2
20004 SITE2NS960    server_2      DRSITE2            server_2
20007 SITE11NS960   server_2      DRSITE1            server_2
20005 SITE3VNX5500  server_2      DRSITE1            server_2
20006 SITE4VNX5500  server_2      DRSITE1            server_2
20008 SITE5NS120    server_2      DRSITE1            server_2
40001 loopback      server_4      DRSITE1            server_4
40003 SITE2NS40     server_4      DRSITE2            server_2

The problem was resolved by removing entries from /nas/site/user_db that were not in the /etc/passwd file. This was caused by a manual modification of the passwd file by a sysadmin, some old entries had been removed and the matching changes were not done in the user_db file.

 

Scripting automatic reports for Isilon from the CLI

We recently set up a virtual demo of an Isilon system on our network as we are evaluating Isilon for a possible purchase.  You can obtain a virtual node that runs on ESX from your local EMC Isilon representative, along with temporary licenses to test everything out.  As part of the test I wanted to see if it was possible to create custom CLI scripts in the same way that I create them on the Celerra or VNX File.  On the Celerra, I run daily scripts that output file pool sizes, file systems & disk space, failover status, logs, health check info, checkpoint info, etc. to my web report page.  Can you do the same thing on an Isilon?

Well, to start with, Isilon’s commands are completely different.  The first step of course was to look at what was available to me.  All of the Isilon administration commands appear to begin with ‘isi’.  If you type isi by itself it will show you the basic list of commands and what they do:

isilon01-1% isi
Description:
OneFS cluster administration.
Usage:
isi  <subcommand>
[--timeout <integer>]
[{--help | -h}]
Subcommands:
Cluster Monitoring:
alert*           An alias for "isi events".
audit            Manage audit configuration.
events*          Manage cluster events.
perfstat*        View cluster performance statistics.
stat*            An alias for "isi status".
statistics*      View general cluster statistics.
status*          View cluster status.
Cluster Configuration:
config*          Manage general cluster settings.
email*           Configure email settings.
job              Isilon job management commands.
license*         Manage software licenses.
networks*        Manage network settings.
services*        Manage cluster services.
update*          Update OneFS system software.
pkg*             Manage OneFS system software patches.
version*         View system version information.
remotesupport    Manage remote support settings.
Hardware & Devices:
batterystatus*   View battery status.
devices*         Manage cluster hardware devices.
fc*              Manage Fibre Channel settings.
firmware*        Manage system firmware.
lun*             Manage iSCSI logical units (LUNs).
target*          Manage iSCSI targets.
readonly*        Toggle node read-write state.
servicelight*    Toggle node service light.
tape*            Manage tape and media changer devices.
File System Configuration:
get*             View file system object properties.
set*             Manage file system object settings.
quota            Manage SmartQuotas, notifications and reports.
smartlock*       Manage SmartLock settings.
domain*          Manage file system domains.
worm*            Manage SmartLock WORM settings.
dedupe           Manage Dedupe settings.
Access Management:
auth             Manage authentication, identities and role-based access.
zone             Manage access zones.
Data Protection:
avscan*          Manage antivirus service settings.
ndmp*            Manage backup (NDMP) settings.
snapshot         Manage SnapshotIQ file system snapshots and settings.
sync             SyncIQ management interface.
Protocols:
ftp*             Manage FTP settings.
hdfs*            Manage HDFS settings.
iscsi*           Manage iSCSI settings.
nfs              Manage NFS exports and protocol settings.
smb              Manage SMB shares and protocol settings.
snmp*            Manage SNMP settings.
Utilities:
exttools*        External tools.
Other Subcommands:
filepool         Manage filepools on the cluster.
storagepool      Configure and monitor storage pools
Options:
Display Options:
--timeout <integer>
Number of seconds for a command timeout.
--help | -h
Display help for this command.

Noe that subcommands or actions that are marked with an asterisk(*) require root login.  If you log in with a normal admin account you’ll get the following error message when you run the command:

isilon01-1% isi status
Commands not enabled for role-based administration require root user access.

Isilon’s OneFS operating system is based on FreeBSD as opposed to Linux for the Celerra/VNX DART OS.   Since it’s a unix based OS with console access, you can create shell scripts and cron job schedules just like the Celerra/VNX File.

Because all of the commands I want to run require root access, I had to create the scripts logged in as root.  Be careful doing this!  This is only a test for me, I would likely look for a workaround for a prod system.    Knowing that I’m going to be FTPing the output files to my web report server, I started by creating the .netrc file in the /root folder.  This is where I store the default login and password for the FTP server. Permissions must be changed for it to work, use chmod 600 on the file after you create it.  It didn’t work for me at first as the syntax is different on FreeBSD than on Linux, so looking at my Celerra notes didn’t help  (For Celerra/VNX File I used “machine <ftp_server_name> login <ftp_login_id> password <ftp_password>”).

For Isilon, the correct syntax is this:

default login <ftp_username> password <ftp_password>
 

The script that FTP’s the files would then look like this for Isilon:

HOST=”10.1.1.1″
ftp $HOST <<SCRIPT
put /root/isilon_stats.txt
put /root/isilon_repl.txt
put /root/isilon_df.txt
put /root/isilon_dedupe.txt
put /root/isilon_perf.txt
SCRIPT
 

For this demo, I created a script that generates reports for File system utilization, Deduplication Status, Performance Stats, Replication stats, and Array Status.  For the Filesystem utilization report, I used two different methods as I wasn’t sure which I’d like better.   Using ‘df –h –a /ifs’ will get you similar information to ‘isi storagepool list’, but the output format is different.   I used cron to schedule the job directly on the Isilon.

Here is the reporting script:

TODAY=$(date)
HOST=$(hostname)
sleep 15
echo “———————————————————————————” > /root/isilon_stats.txt
echo “Date: $TODAY  Host:  $HOST” >> /root/isilon_stats.txt
echo “———————————————————————————” >> /root/isilon_stats.txt
/usr/bin/isi status >> /root/isilon_stats.txt
echo “———————————————————————————” > /root/isilon_repl.txt
echo “Date: $TODAY  Host:  $HOST” >> /root/isilon_repl.txt
echo “———————————————————————————” >> /root/isilon_repl.txt
/usr/bin/isi sync reports list >> /root/isilon_repl.txt
echo “———————————————————————————” > /root/isilon_df.txt
echo “Date: $TODAY  Host:  $HOST” >> /root/isilon_df.txt
echo “———————————————————————————” >> /root/isilon_df.txt
df -h -a /ifs >> /root/isilon_df.txt
echo ” ” >> /root/isilon_df.txt
/usr/bin/isi storagepool list >> /root/isilon_df.txt
echo “———————————————————————————” > /root/isilon_dedupe.txt
echo “Date: $TODAY  Host:  $HOST” >> /root/isilon_dedupe.txt
echo “———————————————————————————” >> /root/isilon_dedupe.txt
/usr/bin/isi dedupe stats  >> /root/isilon_dedupe.txt
echo “———————————————————————————” > /root/isilon_perf.txt
echo “Date: $TODAY  Host:  $HOST” >> /root/isilon_perf.txt
echo “———————————————————————————” >> /root/isilon_perf.txt
sleep 1
echo ”  ” >> /root/isilon_perf.txt
echo “–System Stats–” >> /root/isilon_perf.txt
echo ”  ” >> /root/isilon_perf.txt
/usr/bin/isi statistics system  >> /root/isilon_perf.txt
sleep 1
echo ”  ” >> /root/isilon_perf.txt
echo “–Client Stats–” >> /root/isilon_perf.txt
echo ”  ” >> /root/isilon_perf.txt
/usr/bin/isi statistics client  >> /root/isilon_perf.txt
sleep 1
echo ”  ” >> /root/isilon_perf.txt
echo “–Protocol Stats–” >> /root/isilon_perf.txt
echo ”  ” >> /root/isilon_perf.txt
/usr/bin/isi statistics protocol  >> /root/isilon_perf.txt
sleep 1
echo ”  ” >> /root/isilon_perf.txt
echo “–Protocol Data–” >> /root/isilon_perf.txt
echo ”  ” >> /root/isilon_perf.txt
/usr/bin/isi statistics pstat  >> /root/isilon_perf.txt
sleep 1
echo ”  ” >> /root/isilon_perf.txt
echo “–Drive Stats–” >> /root/isilon_perf.txt
echo ”  ” >> /root/isilon_perf.txt
/usr/bin/isi statistics drive  >> /root/isilon_perf.txt
 

Once the ouput files are FTP’d to the web server, I have a basic HTML page that uses iframes to show the text files.  The web page is then automatically updated as soon as the new text files are FTP’d.  Below is a screenshot of my demo report page.  It doesn’t show the entire page, but you’ll get the idea.

IsilonReports

Easy reporting on data domain using the autosupport log

I was looking for an easy (and free) way to do some daily reporting on our data domain hardware.  I was most interested in reporting on disk space, but decided to gather some other data as well.  The easiest method I found is to use the info collected in the autosupport log.  First, I’ll explain how to automatically gather the autosupport log from your data domain, and then move on to how to pull only the information you want from it for reporting purposes.

First, you need to enable ftp on the data domain and add the IP address of the FTP server you’re going to use to pull the file.  Here are the commands you need to run on your data domain:

adminaccess enable ftp
adminaccess add ftp 10.1.1.1 
 

Next, you’ll want to pull the files from the data domain.  I am using a windows FTP server to pull the autosupport files.  The windows batch script I use is scheduled to run once a day and is listed below.  I use the ftp command and call a text file with the specific IP and login info for each data domain box.

ftp -s:10.1.1.2.txt
ftp -s:10.1.1.3.txt
ftp -s:10.2.1.2.txt
ftp -s:10.2.1.3.txt
 

The text files that the ftp script calls (that are named <ipaddress.txt>) look like this:

open 10.1.1.2
sysadmin
<password>
cd support
lcd e:\reports\dd\SITE1_DD_01
get autosupport
quit
 

Once you’ve downloaded the autosupport file, you can run scripts against it to pull out only the info you’re interested in.  I prefer to use unix shell scripts for parsing files because of the extra functionality over standard windows batch scripts.  I have Cygwin installed on my web server to run these shell scripts.

If you take a look at the autosupport log, you’ll notice that it’s very long and contains some duplicate characters in multiple areas, which makes using grep, awk, and sed a bit more challenging.  It took a bit of trial and error, but the scripts below will pull only the specific areas I wanted:  System Alerts, File system compression, Locked files, Replication info, File distribution, and Disk space information.

The first script below will gather disk space information using grep.  Spacing inside the quotes is very important in this case, as I was looking for specific unique strings within the autosupport log, and using grep for these unique strings will produce the correct output.  In this example, I am gathering info for four data domain boxes. For those unfamiliar with Cygwin, you can access windows drive letters by navigating to /cygdrive/<drive letter>.  To view the root of the C drive, you would type ‘cd /cygdrive/c’ from the CLI.  In this case, my windows batch script that pulls the autosupport files places them in the e:\reports\dd folder, which would be /cygdrive/e/reports/dd folder when using the cygwill shell.

Here is the script:

# Site 1 Data Domain 01
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_01/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt
 
# Site 1 Data Domain 02
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/SITE1_DD_02/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt
 
# Site 2 Data Domain 01
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_01/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt
 
# Site 2 Data Domain 02
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “=  SERVE” > /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “Active Tier:” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “Resource           Size” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “/data:” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
cat /cygdrive/e/reports/dd/Site2_DD_02/autosupport | grep “/ddvar     ” >> /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt
 
# Copy the reports to the web server directory.
cp /cygdrive/e/reports/dd/SITE1_DD_01/SITE1_DD_01_diskspace.txt /cygdrive/c/inetpub/wwwroot/
cp /cygdrive/e/reports/dd/SITE1_DD_02/SITE1_DD_02_diskspace.txt /cygdrive/c/inetpub/wwwroot/
cp /cygdrive/e/reports/dd/Site2_DD_01/Site2_DD_01_diskspace.txt /cygdrive/c/inetpub/wwwroot/
cp /cygdrive/e/reports/dd/Site2_DD_02/Site2_DD_02_diskspace.txt /cygdrive/c/inetpub/wwwroot/
 

For the remaining reports, I used the ‘sed’ command rather than ‘grep’.  As an example, I’ll explain how I stripped out the information for the System Alerts section.  In order to strip out it out, I use sed to start cutting text when it reaches ‘Current Alerts’, and stop cutting text when it reaches ‘There are’.   This same process is repeated to pull the information for the other reports as well (alerts, compression, locked files, replication, distribution).

This is what the Current Alerts section looks like in the autosupport log:

Current Alerts
--------------
Alert Id   Time                       Severity   Class               Object          Message                                                             
--------   ------------------------   --------   -----------------   -------------   ---------------------------------------------------------------------
774        Wed Dec 19 11:32:19 2012   CRITICAL   SystemMaintenance                   Core dump capability is now disabled due to lack of space in /ddvar.
807        Sun Jan 20 09:07:18 2013   WARNING    Replication         context=3       repl ctx 3: Sync-as-of time is more than 461 hours ago.             
808        Sun Jan 20 09:07:18 2013   WARNING    Replication         context=4       repl ctx 4: Sync-as-of time is more than 461 hours ago.             
809        Sun Jan 20 09:07:19 2013   WARNING    Replication         context=5       repl ctx 5: Sync-as-of time is more than 461 hours ago.              
814        Mon Jan 28 23:20:45 2013   CRITICAL   Filesystem          FilesysType=2   Space usage in Data Collection has exceeded 95% threshold.          
820        Wed Jan 30 15:52:40 2013   WARNING    Replication         context=2       repl ctx 2: Sync-as-of time is more than 148 hours ago.             
824        Mon Feb  4 12:36:39 2013   WARNING    Replication         context=10      repl ctx 10: Sync-as-of time is more than 24 hours ago.             
825        Wed Feb  6 02:13:51 2013   WARNING    Replication         context=19      repl ctx 19: Sync-as-of time is more than 24 hours ago.             
--------   ------------------------   --------   -----------------   -------------   ---------------------------------------------------------------------
There are 8 active alerts.
 

In this case, sed searches for ‘Current Alerts’ and copies all of the text until it reaches the string ‘There are’, at which point it stops and outputs the file.  The output files are written directly to the wwwroot folder on my internal reporting web page.  Each page encapsulates the output files in an iframe, so the web page is automatically updated every morning when the script runs.

Here is the second script that captures the remaining data:

#For Alerts
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_alerts.txt
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_alerts.txt
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_alerts.txt
sed -n ‘/Current Alerts/,/There are/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_alerts.txt
 
#For Filesystem Compression
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_filecomp.txt
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_filecomp.txt
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_filecomp.txt
sed -n ‘/Filesys Compression/,/((Pre-/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_filecomp.txt
 
#For Locked Files:
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_lockedfiles.txt
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_lockedfiles.txt
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_lockedfiles.txt
sed -n ‘/Locked files/,/Active/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_lockedfiles.txt
 
#Repl Transferred over 24 hrs
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_repl.txt
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_repl.txt
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_repl.txt
sed -n ‘/Replication Data Transferred/,/(sum)/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_repl.txt
 
#File Distribution
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/SITE1_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_01_filedist.txt
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/SITE1_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/SITE1_DD_02_filedist.txt
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/Site2_DD_01/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_01_filedist.txt
sed -n ‘/File Distribution/,/> 500 GiB/p’ /cygdrive/e/reports/dd/Site2_DD_02/autosupport > /cygdrive/c/inetpub/wwwroot/Site2_DD_02_filedist.txt
 

When all is said and done, the report output looks like this:

Alerts:

Current Alerts
--------------
Alert Id   Time                       Severity   Class         Object          Message                                                   
--------   ------------------------   --------   -----------   -------------   ----------------------------------------------------------
1157       Wed Feb  5 12:23:19 2014   WARNING    Replication   context=3       Sync-as-of time is more than 24 hours ago.                
1161       Sun Feb  9 08:33:48 2014   CRITICAL   Filesystem    FilesysType=2   Space usage in Data Collection has exceeded 95% threshold.
--------   ------------------------   --------   -----------   -------------   ----------------------------------------------------------
There are 2 active alerts.

Filesystem Compression:

Filesys Compression
--------------                  
From: 2014-03-06 05:00 To: 2014-03-13 06:00

                  Pre-Comp   Post-Comp   Global-Comp   Local-Comp      Total-Comp
                     (GiB)       (GiB)        Factor       Factor          Factor
                                                                    (Reduction %)
---------------   --------   ---------   -----------   ----------   -------------
Currently Used:   495122.5    122948.5             -            -     4.0x (75.2)
Written:*                                                                        
  Last 7 days       9204.2      5656.6          1.1x         1.5x     1.6x (38.5)
  Last 24 hrs         74.1        36.4          1.0x         2.0x     2.0x (50.9)
---------------   --------   ---------   -----------   ----------   -------------
 * Does not include the effects of pre-comp file deletes/truncates
   since the last cleaning on 2014/03/11 02:07:50.

Replications:

Replication Data Transferred over 24hr
--------------------------------------
Directory/MTree Replication:
Date         Time         CTX   Pre-Comp (KB)   Pre-Comp (KB)           Replicated (KB)   Low-bw-   Sync-as-of      
                                      Written       Remaining    Pre-Comp       Network     optim   Time            
----------   --------   -----   -------------   -------------   -----------------------   -------   ----------------
2014/03/13   06:36:18       2     104,186,675      86,628,349   55,522,200   37,756,047      1.00   Thu Mar 13 05:53
                            3               0               0            0            0      0.00   Tue Feb  4 12:00
                            4               0               0            0        4,882      0.00   Thu Mar 13 06:00
                            8               0               0            0        5,120      0.00   Thu Mar 13 06:00
                           29               0               0            0        5,098      0.00   Thu Mar 13 06:00
                        (sum)     104,186,675                   55,522,200   37,771,148      1.00  

File Distribution:

File Distribution
-----------------
169,432 files in 8,691 directories

                          Count                         Space
               -----------------------------   --------------------------
         Age         Files       %    cumul%        GiB       %    cumul%
   ---------   -----------   -----   -------   --------   -----   -------
       1 day            13     0.0       0.0       74.4     0.0       0.0
      1 week           223     0.1       0.1     2133.7     0.4       0.4
     2 weeks         5,103     3.0       3.2    39121.2     7.8       8.3
     1 month        12,819     7.6      10.7    79336.5    15.9      24.1
    2 months        21,853    12.9      23.6   153873.8    30.8      54.9
    3 months         5,743     3.4      27.0    45154.6     9.0      64.0
    6 months         3,402     2.0      29.0    13674.3     2.7      66.7
      1 year        17,035    10.1      39.1    72937.7    14.6      81.3
    > 1 year       103,241    60.9     100.0    93461.7    18.7     100.0
   ---------   -----------   -----   -------   --------   -----   -------

                          Count                         Space
               -----------------------------   --------------------------
        Size         Files       %    cumul%        GiB       %    cumul%
   ---------   -----------   -----   -------   --------   -----   -------
       1 KiB        11,257     6.6       6.6        0.0     0.0       0.0
      10 KiB        44,396    26.2      32.8        0.3     0.0       0.0
     100 KiB        38,488    22.7      55.6        1.3     0.0       0.0
     500 KiB         9,652     5.7      61.3        2.1     0.0       0.0
       1 MiB        27,460    16.2      77.5       70.4     0.0       0.0
       5 MiB        12,136     7.2      84.6       27.3     0.0       0.0
      10 MiB         3,861     2.3      86.9       26.9     0.0       0.0
      50 MiB         4,367     2.6      89.5       96.6     0.0       0.0
     100 MiB           853     0.5      90.0       58.2     0.0       0.1
     500 MiB           861     0.5      90.5      201.0     0.0       0.1
       1 GiB           495     0.3      90.8      309.7     0.1       0.2
       5 GiB           567     0.3      91.1     1460.2     0.3       0.5
      10 GiB           336     0.2      91.3     2574.1     0.5       1.0
      50 GiB        14,691     8.7     100.0   494083.5    98.9      99.8
     100 GiB            11     0.0     100.0      683.3     0.1     100.0
     500 GiB             1     0.0     100.0      173.0     0.0     100.0
   > 500 GiB             0     0.0     100.0        0.0     0.0     100.0

Disk Space:

==========  SERVER USAGE   ==========
Active Tier:
Resource           Size GiB   Used GiB   Avail GiB   Use%   Cleanable GiB
/data: pre-comp           -   245211.8           -      -               -
/data: post-comp    51484.9    31811.0     19673.9    62%            35.4
/ddvar                 78.7        7.9        66.8    11%               -

Comparing Dot Hill and EMC Auto Tiering

The problem with auto tiering in general is that a large amount of hot I/Os is “hot” only for a brief moment in time. Workloads are not uniform across an entire data set constantly and if the data isn’t moved in real time it’s very likely that hot data will be accessed from capacity storage.  Ideally a storage system would be able to react to these performance improvements in real time, but the cost in overhead to the storage processors is generally too great.  The problem is somewhat mitigated by having a large Cache (like EMC’s Fast Cache) but the ability to automatically tier data in real time would be ideal.

I recently did a comparison of Dot Hill’s auto tiering strategy to EMC’s, and found that Dot Hill takes a very unique approach to auto tiering that looks promising.  Here’s a brief comparison between the two.

EMC

EMC greatly improved FAST VP on their VNX2.   While the VNX uses 1GB data slices, the VNX2 uses more granular 256MB data slices.  This greatly improves efficiency.  EMC is also shipping MLC SSD instead of SLC SSD. This makes using SSD’s much more affordable in FAST VP pools. (Note that SLC is still required for FAST Cache).

How does the smaller data slice size improve efficiency?  As an example, assume that a 500MB contiguous range of hot data residing on a SAS tier needs to be moved to EFD. ON the VNX1, an entire 1GB slice would be moved.   If this “hot” 1GB slice was moved to a 100GB EFD drive, we would have 100GB of data sitting on the EFD drive but only 50GB of the data is hot.  This is obviously inefficient, as 50% of the data is cold.  With the VNX2’s 256MB data slice size, only 500GB would be moved, resulting in 100% efficiency for that block of data.  The VNX2 makes much more efficient use of the extreme performance and performance tiers.

EMC’s FAST VP auto tiering on the VNX1 is configured by either setting it to manual or creating a schedule.  The schedule can be set to run 24 hours a day to move data in real time, but in practice it’s not practical.  The overhead on the storage processors is simply too great and we’ve configured it to run during off peak hours.  On our busiest VNX1 we see the storage processors jump from ~50% utilization to ~75% utilization when the relocation job is running.  This may improve with the VNX2, but it’s been a problem on the CX series and the VNX1.

DotHill

According to Dot Hill, their automatic auto tiering doesn’t look at every single IO like the VNX or VNX2, it looks for trends in how data is accessed.  Their rep told me to think of it as examining every 10th IO rather than every single IO.  The idea is to allow the array to move data in real time without overloading the storage processors.  Dot Hill also moves data in 4MB data slices (which is very efficient, as I explained earlier when discussing the VNX2), and will not move more than 80 MB in a 5 second time span (or 960MB/minute maximum) to keep the CPU load down.

So, how does the Dot Hill auto tiering actually work?  They use scoring, scanning, and sorting and each are separate processes that work in real time.  Scoring is used to maintain a current page ranking on every I/O using a process that adds less than one microsecond of overhead.  The algorithm looks at how frequently and how recently the data was accessed.  Higher scores are given to data that is more frequently and recently accessed. Scanning for the high scoring pages happens every 5 seconds. The scanning process uses less than 1.0% of the CPU. The pages with the highest scores become candidates for promotion to SSD.  Sorting is the process that actually moves or migrates the pages up or down based on their score.  As I mentioned earlier, no more than 80 MB of data is moved during any 5 second sort to minimize the overall performance impact.

Summary

I haven’t used EMC’s new VNX2 or Dot Hill’s AssuredSAN to provide any information that uses real world experience.  I think Dot Hill’s implementation looks very promising on paper, and I look forward to reading more customer experiences about it in the future.  They’ve been around a long time but they only recently started offering their products directly to customers as they’ve primarily been an OEM storage manufacturer.  As I mentioned earlier, my experience with EMC’s FAST VP on the CX series and VNX1 show that real time FAST VP consumes too many CPU cycles to be used in real time, we have always run it as an off-business hours process. That’s exactly what Dot Hill’s implementation is trying to address.  We’ve made adjustments to the FAST VP relocation schedule based on monitoring our workload.  We also use FAST Cache, which at least partially solves the problem of suddenly hot data needing extra IO.  FAST Cache and FAST VP work very well together.  Overall I’ve been happy with EMC’s implementation, but it’s good to see another company taking a different approach that could be very competitive with EMC.

You can read more about Dot Hill’s auto tiering here:

http://www.dothill.com/wp-content/uploads/2012/08/RealStorWhitePaper8.14.12.pdf

You can read more about EMC’s VNX1 FAST VP Here:  

https://www.emc.com/collateral/software/white-papers/h8058-fast-vp-unified-storage-wp.pdf

You can read more about EMC’s VNX2 FAST VP Here:

https://www.emc.com/collateral/white-papers/h12208-vnx-multicore-fast-cache-wp.pdf

What is VPLEX?

We are looking at implementing a storage virtualization device and I started doing a bit of research on EMC’s product offering.  Below is a summary of some of the information I’ve gathered, including a description of what VPLEX does as well as some pros and cons of implementing it.  This is all info I’ve gathered by reading various blogs, looking at EMC documentation and talking to our local EMC reps.  I don’t have any first-hand experience with VPLEX yet.

What is VPLEX?

VPLEX at its core is a storage virtualization appliance. It sits between your arrays and hosts and virtualizes the presentation of storage arrays, including non-EMC arrays.  Instead of presenting storage to the host directly you present it to the VPLEX. You then configure that storage from within the VPLEX and then zone the VPLEX to the host.  Basically, you attach any storage to it, and like in-band virtualization devices, it virtualizes and abstracts them.

There are three VPLEX product offerings, Local, Metro, and Geo:

Local.  VPLEX Local manages multiple heterogeneous arrays from a single interface within a single data center location. VPLEX Local allows increased availability, simplified management, and improved utilization across multiple arrays.

Metro.  VPLEX Metro with AccessAnywhere enables active-active, block level access to data between two sites within synchronous distances.  Host application stability needs to be considered. It is recommended that depending on the application that consideration for Metro be =< 5ms latency. The combination of virtual storage with VPLEX Metro and virtual servers allows for the transparent movement of VM’s and storage across longer distances and improves utilization across heterogeneous arrays and multiple sites.

Geo.  VPLEX Geo with AccessAnywhere enables active-active, block level access to data between two sites within asynchronous distances. Geo improves the cost efficiency of resources and power.  It provides the same distributed device flexibility as Metro but extends the distance up to 50ms of network latency. 

Here are some links to VPLEX content from EMC, where you can learn more about the product:

What are some advantages of using VPLEX? 

1. Extra Cache and Increased IO.  VPLEX has a large cache (64GB per node) that sits in-between the host and the array. It offers additional read cache that can greatly improve read performance on databases because the additional cache is offloaded from the individual arrays.

2. Enhanced options for DR with RecoverPoint. The DR benefits are increased when integrating RecoverPoint with VPLEX Metro or Geo to replicate the data using real time replication. It includes a capacity based journal for very granular rollback capabilities (think of it as a DVR for the data center).  You can also use the native bandwidth reduction features (compression & deduplication) or disable them if you have WAN optimization devices installed like those from Riverbed.  If you want active/active read/write access to data across a large distance, VPLEX is your only option.  NetApp’s V-Series and HDS USPV can’t do it unless they are in the same data center. Here’s a few more advantages:

  • DVR-like recovery to any point in time
  • Dynamic synchronous and asynchronous replication
  • Customized recovery point opbjectives that support any-to-any storage arrays
  • WAN bandwidth reduction of up to 90% of changed data
  • Non-disruptive DR testing

4. Non disruptive data mobility & reduced maintenance costs. One of the biggest benefits of virtualizing storage is that you’ll never have to take downtime for a migration again. It can take months to migrate production systems and without virtualization downtime is almost always required. Also, migration is expensive, it takes a great deal of resources from multiple groups as well as the cost of keeping the older array on the floor during the process. Overlapping maintenance costs are expensive too.  By shortening the migration timeframe hardware maintenance costs will drop, saving money.   Maintenance can be a significant part of the storage TCO, especially if the arrays are older or are going to be used for a longer period of time.  Virtualization can be a great way to reduce those costs and improve the return on assets over time.

5. Flexibility based on application IO.  The ability to move and balance LUN I/O among multiple smaller arrays non-disruptively would allows you to balance workloads and increase your ability to respond to performance demands quickly.  Note that underlying LUNs can be aggregated or simply passed through the VPLEX.

6. Simplified Management and vendor neutrality.   Implementing VPLEX for all storage related provisioning tasks would reduce complexity with multiple vendor arrays.  It allows you to manage multiple heterogeneous arrays from a single interface.  It also makes zoning easier as all hosts would only need to be zoned to the VPLEX rather than every array on the floor, which makes it faster and easier to provision new storage to a new host.

7. Increased leverage among vendors.  This advantage would be true with any virtualization device.  When controller based storage virtualization is employed, there is more flexibility to pit vendors against each other to get the best hardware, software and maintenance costs.  Older arrays could be commoditized which could allow for increased leverage to negotiate the best rates.

8. Use older arrays for Archiving. Data could be seamlessly demoted or promoted to different arrays based on an array’s age, it’s performance levels and it’s related maintenance costs.  Older arrays could be retained for capacity and be demoted to a lower tier of service, and even with the increased maintenance costs it could still save money.

9. Scale.  You can scale it out and add more nodes for more performance when needed.  With a VPLEX Metro configuration, you could configure VPLEX with up to 16 nodes in the cluster between the two sites.

What are some possible disadvantages of VPLEX?

1. Licensing Costs. VPLEX is not cheap.  Also, it can be licensed per frame on VNX but must be licensed per TB on CX series.  Your large,older CX arrays will cost you a lot more to license.

2. It’s one more device to manage.   The VPLEX is an appliance, and it’s one more thing (or things) that has to be managed and paid for.

3. Added complexity to infrastructure.  Depending on the configuration, there could be multiple VPLEX appliances at every site, adding considerable complexity to the environment.

4. Managing mixed workloads in virtual enviornments.  When heavy workloads are all mixed together on the same array there is no way to isolate them, and the ability to migrate that workload non-disruptively to another array is one of the reasons to implement a VPLEX.  In practice, however, those VMs may end up being moved to another array with the same storage limitations as where they came from.  The VPLEX could be simply temporarily solving a problem by moving that problem to a different location.

5. Lack of advanced features. The VPLEX has no advanced storage features such as snapshots, deduplication, replication, or thin provisioning.  It relies on the underlying storage array for those type of features.  As an example, you may want to utilize block based deduplication with an HDS array by placing a Netapp V-series in front of it and using Netapp’s dedupe to enable it.  It is only possible to do that with a Netapp Vseries or HDS USP-V type device, the VPLEX can’t do that.

6. Write cache performance is not improved.  The VPLEX uses write-through caching while their competitor’s storage virtualization devices use write-back caching. When there is a write I/O in a VPLEX environment the I/O is cached on the VPLEX, however it is passed all the way back to the virtualized storage array before an ack is sent back to the host.  The Netapp V-Series and HDS USPV will store the I/O in their own cache and immediately return an ack to the host.  At that point the I/Os are flushed to the back end storage array using their respective write coalescing & cache flushing algorithms.  Because of the write-back behavior it is possible for a possible performance gain above and beyond the performance of the underlying storage arrays due to the caching on these controllers.  There is no performance gain for write I/O in VPLEX environments beyond the existing storage due to the write-through cache design.

Using Microsoft’s BranchCache with the Celerra & VNX

We recently moved the data from several of our small offices to being hosted at a central regional data center. In one case, a site that formerly had it’s own Celerra for file access was now accessing NAS through a WAN link.  As expected, access to files was much slower.  After doing some research on how to speed up file access for users at the branch office locations I came across Microsoft’s BranchCache feature.

BranchCache is a WAN bandwidth optimization technology and is available on Windows 7 & 8, Server 2008 R2 and Server 2012. When BrancheCache is enabled, it creates a cache of the content from the file server locally within the branch office. A client from the same network can then access the file very quickly from cache instead of having to download it across the WAN again. It’s a great feature for optimizing local link utilization, increasing the responsiveness of applications, and reducing WAN bandwidth consumption.

BranchCache can operate in either Hosted Cache Mode or Distributed Cache mode. In Hosted Cache mode, there is a Windows server configured to store the cached files. In distributed cache mode (appropriate for smaller sites) local clients in the office keep a copy of the content and make it available to other clients that access the same files.

Once your Windows Administrator has BranchCache configured on a server (or it’s been enabled on the local client PC’s), enabling it on the Celerra/VNX side is very simple. Log in to the CLI and su to gain root credentials. Then type in the following command:

server_cifs server_2 -smbhash -service enable

If you’d like to enable BranchCache auditing so the Windows server administrators can see audit info the Windows event logs, type in this command:

server_cifs server_2 -smbhash -audit enable

After that, you will need to restart the CIFS service on the data mover. Here are the commands to stop and start CIFS:

server_setup server_2 -P cifs -o stop
server_setup server_2 -P cifs -o start

To confirm that BranchCache was successfully enabled, type the following command:

server_cifs server_2 -smbhash -info

The output should look like this:

server_2:
Current smbhash parameters:
—————————
Enabled : Yes
Started : Yes

Finally, To enable hash support on each individual CIFS Share that you want to use for Branch Cache clients, use the command syntax below.  Note that EMC’s “Configuring  Branch Cache” document I link to at the bottom of this post does not contain this next command (major oversight).  It was pulled from page 58 of EMC’s Configuring CIFS Guide.

# server_export <datamover_name> -name <fs_name> -option <netbios_name_of_CIFS_Server>, type=hash /<fs_name>

EMC has a detailed document on how to configure BranchCache, including all the steps you’d need to take to configure the server and PC clients. If you have an EMC support account you can download it here:

https://support.emc.com/docu42265_Configuring-BranchCache-V2-on-VNX.pdf?language=en_US

There is also a lengthy section on BranchCache in EMC’s CIFS Management for VNX guide here:

https://mydocs.emc.com/VNXDocs/CIFS.pdf.

Here is the link to Microsoft’s TechNet guide for BranchCache:

http://technet.microsoft.com/en-us/library/hh831696.aspx.

Creating an NFS export from a file system on a Virtual Data Mover (VDM)

When creating a new file system on a virtual data mover, you may have noticed that it can’t be NFS exported from within Unisphere.  From the GUI, file systems associated with a VDM do not appear on the drop down list for NFS exports when you attempt to create one.  I had always assumed that it simply couldn’t be done until we had a business need for it and I investigated it a bit further.  As it turns out NFS exports can be mounted on a VDM path from the command line interface, with a few restrictions.

1. You need to export the NFS mountpoint at the physical Data Mover level (server_2, server_3, etc.) and include the nested path of the VDM.  The path is generally /root_vdm_X/fs_name. You cannot export the NFS on the VDM level.

2. Since the NFS export is done at the physical Data Mover level it is not restricted to one specific VDM.

3. The nested NFS export can only be done from command line, it is not an option from within Unisphere.  Once you have created the NFS export, however, you can use the GUI to make changes to it.

4. The client must map the NFS export by using celerra:/root_vdm_X/fs_name, or they need to set up NFS export aliasing to hide the nested path.

5. IP replication of the VDM will not replicate the source site NFS exports to the destination site. You will need to manually create those NFS exports on the destination side. Also, you cannot access those file systems unless the VDM has failed over.

What is EMC’s CAVA / Common Event Enabler?

I was recently asked to do a bit of research on EMC’s CAVA product, as we are looking for AntiVirus solutions for our CIFS based shares.  I found very little info with general google searches about exactly what CAVA is and what it does, so I thought I’d share some of the information that I did find after a bit of research and talking to my local EMC rep. 

Basically CAVA is a service runs on the Celerra (or VNX) data mover in conjunction with a Windows server running a 3rd Party Anti-Virus engine (along with EMC’s CAVA API agent) to handle the conversation.  It only facilitates the communication to an existing AV server, EMC doesn’t provide the actual AV software.  It supports Symantec, McAfee, eTrust, Sophos, Kaspersky, and Trend Micro.  In a nutshell, CAVA employs three key components:  Software on the data mover (VC Client), Software on a windows AV server (CAVA), and your 3rd party AV engine on a Windows server. 

CAVA used to stand for “Celerra Anti Virus Agent”, but was changed to “Common AntiVirus Agent”.  Quite convenient that they could re-use the “C” without changing the acronym, right? The product is now officially known as “Common Event Enabler for Windows” by EMC and the package includes CEPA, or the EMC Common Event Publishing Agent, and CAVA, the aforementioned Common Antivirus Agent.  For this post I’m focusing on the Antivirus agent.

CAVA is a fairly straightforward install, however if implemented incorrectly it can adversely affect your performance. It’s important to know how it scans your files and essential to know how to troubleshoot it and do performance monitoring.  There is definitely a performance hit when using CAVA. 

When are files scanned for a virus? 

Each time the Celerra receives a file, it will be locked for read access first, at which time a request is sent to the AV server (or servers) to scan the file.  The Celerra will send the UNC path name to the windows server and wait for verification that the file is not affected.  Once that verification is complete, the file is made available for user access. 

CAVA will scan a file in the following instances: 

  •          CAVA will scan files for a virus the first time that a file is read, subsequent to the initial implementation of CAVA and any updates to virus definitions.
  •          Creating, modifying, or moving a file
  •          When restoring a file (or files) from backup
  •          When renaming a file with a different file extension
  •          Whenever an administrator performs a full file system scan (with the server_viruschk command) 

What are the features of CAVA? 

  •          Automatic Virus Definition Updates. Files opened after the update will be re-scanned.
  •          CAVA Calculator (a free sizing tool to assist in implementation)
  •          User Notifications on Virus detection, cofigurable by administrators to be sent as notifications to the client, event log entries, or both.
  •          Scan on read can be enabled
  •          Event reporting and configuration 

What are some implementation considerations? 

  •          EMC recommends that an MPFS client system not be configured as the AV server system.
  •          CAVA doesn’t support a data mover CIFS server using share level access.
  •          Always update the viruschecker.conf file to avoid scanning temp files. It can be modified with the Celerra AV Management Snap-In.
  •          It’s CIFS only. There is no support for NFS or FTP.  If those protocols are used to open, modify, or move files the files will not be scanned.
  •          You must check for compatibility with your installed 3rd party AV software.

How is it licensed, and how much does it cost?

CAVA is licensed per array, on the VNX series it is in the Security and Compliance Suite.   Pricing will vary of course, but it’s not very expensive relative to the cost of the array.  It should be in the range of thousands rather than tens of thousands of dollars.

 

Troubleshooting NAS Discovery issues on EMC Ionix Control Center (ECC)

We recently converted two of our existing VNX arrays to unified systems, and I was attempting to add our newly added NAS to Control Center.  I went through the normal assisted discovery using the ‘NAS Container’ option.  Unfortunately, I got an error in the discovery results window.  Here’s the error I saw:

SESSION_ACTION: Discover  [4]  MO Type = NasContainer
Container_IP=10.10.10.4 | Container_Port=443 | Container_Username=root | Container_Password=****** | Container_Type=Celerra
  command status = finished, errors
  objects found = 6  agents responding = 2
  completed in 231 seconds
  action begins at: Wed Nov 06 09:48:05 CST 2013
  action ends at: Wed Nov 06 09:51:56 CST 2013
 
Reported objects:
[1]  NasContainer=10.10.10.4
 
Reported agent errors:
[1]  ADAResult: (9) Celerra@10.10.10.4 : SSH communication failed – Please verify emcplink settings
    Responding agent: NAS Agent @ eccagtserver.rgare.net
[2]  ADAResult: (9) Celerra@10.10.10.4: nas_version returned invalid response
    Responding agent: NAS Agent @ eccinfserver.rgare.net
Responding agents:
[1]  NAS Agent @ eccagtserver.rgare.net
[2]  NAS Agent @ eccinfserver.rgare.net

 

It looks like there’s an ssh setting that’s incorrect.  Being unfamiliar with the emcplink utility, I did a bit of research on how to configure it properly, and I will go through what needs to be done.

Before diving in to using and configuring emcplink, here are some simple troubleshooting steps you should run through first:

   – Verify that the NAS agent is installed and active.  You can view all of the running agents by clicking on the gear icon on the lower right hand side (in the status bar).  Scroll through the agents and make sure the agent is active.

   – Verify that the Java Process is running on the Control Station.

         * Log in to the Control Station and type the following command:

                      ps -aex |grep java

          * If it’s running, you will see lines similar to the following:

                      21927 ? S 0:15 /usr/java/bin/java -server …..

                      22200 ? S 0:00 /usr/java/bin/java -server …..

 - Make sure that the ssh daemon is running (I’m assuing you’re using ssh for remote connectivity):

           * Log in to the Control Station and type the following command:

                       ps -aex |grep sshd

           * If it’s running, you will see a line similar to the following:

                       882 ?      Ss     0:00 /usr/sbin/sshd

 - Verify that the Celerra (or VNX) data mover is connected to an array

                       nas_storage -list

 - Verify the user name and password youre using during the assisted discovery works. Try logging on with that ID/password directly.

Here are the troubleshooting steps I took, and some more info about emcplink:

What is emcplink? It’s a utility allows you to specify security policies for secure shell (SSH) client authentication which is required for the Storage Agent for NAS to discover NAS containers.

The highest ssh security level (full security) requires that users manually run emcplink in order to provide a username and password for ssh authentication and to manually accept an ssh key returned from emcplink to discover the NAS container.  Afer it’s accepted the key is stored on the NAS Agent host. If the key changes, to rediscover the NAS container you must manually run emcplink again and accept the changed key. If your environment does not require full ssh security, use emcplink to set lower security levels that will automatically accept new or changed keys without requiring the manual entering of ssh usernames, passwords, and keys.

The emcplink command is a command line utility. To run emcplink, first open a command prompt window.  Change to the <install_root>/exec/CNN610 directory on the host where Storage Agent for NAS resides, where <install_root> is the ControlCenter infrastructure install directory.

If your installation uses SSH version 2, update your agent configuration so emcplink uses SSH version 2 when handling SSH keys, the default is version 1. Note that SSH version 2 is not backward compatible with version 1. If you switch to SSH version 2, you must run emcplink again to rediscover all NAS containers that were previously discovered with SSH version 1.

If you want to update your install to version 2, follow these steps (I did this during my troubleshooting):

1. Stop Storage Agent for NAS using the ControlCenter Console.

2. Edit the following file:

      <install_root>/exec/CNN610/cnn.ini

3. In cnn.ini [ssh] change version = 1 to version = 2

4. Save and exit cnn.ini.

5. Restart Storage Agent for NAS.

The next step is to enable the policy that you need for your environment. The default policy is EMC_SSH_KEY_SECURITY_FULL.  You can add more than one policy.  If added policies contradict one another, the most recently added policy takes effect.

Enter the following command to add or remove an SSH security policy:

       emcplink -setpolicy [+|-]policy_name

       Example:  emcplink -setpolicy +EMC_SSH_KEY_SECURITY_FULL

In my case, I first disabled the default policy, then enabled the policy that I wanted. Here are the commands I ran:

       emcplink -setpolicy -EMC_SSH_KEY_SECURITY_FULL

       emcplink -setpolicy +EMC_SSH_KEY_SECURITY_ALLOW_NEW

After running it, I used the ‘getpolicy’ option to verify what the current active policy was:

       emcplink -getpolicy

the output looks like this:

          Policy Is:
              EMC_SSH_KEY_SECURITY_ALLOW_NEW
 

Here are the policy options you can choose from:

EMC_SSH_KEY_SECURITY_FULL (default)

Do not automatically accept any new or changed NAS container keys. They must be accepted manually, using emcplink (refer to emcplink - interactive). Provides the same functionality that plink (no longer valid) provided in previous ControlCenter versions, when manual user name/password/key entry was required.

EMC_SSH_KEY_SECURITY_ALLOW_NEW

Accept new keys, but not changed keys. SSH authentication occurs automatically for initial discovery, and also for subsequent discoveries as long as the NAS Container key does not change. If a key is changed, discovery is attempted via telnet.

EMC_SSH_KEY_SECURITY_ALLOW_CHANGE

Accept changed keys, but not new keys. When a Celerra is initially discovered, SSH authentication occurs manually. If the key is changed for subsequent discoveries of that NAS Container, SSH authentication occurs automatically.

EMC_SSH_KEY_SECURITY_NONE

Accept both new and changed keys. SSH authentication occurs automatically at initial discovery and all subsequent discoveries, regardless of whether NAS container keys are changed.

After I verified the policy I wanted was running, I then manually enterted SSH security information for each array I wanted to add to accept the NAS container keys. Run the following command for each array to add the key to the server cache (the -2 optionally tells emcplink to use SSH version 2):

         emcplink -ssh -interactive -2 -pw password username@Array_IP_address

Note that SSH version 2 is not backward compatible with version 1. If you switch to SSH version 2, you must run emcplink again to rediscover all NAS containers that were previously discovered with SSH version 1.

That’s it! Once I accepted all of the ssh keys and re-ran the discoveries, the new arrays were discovered just fine.

 

Changing the Login banner on a Celerra / VNX File control station

If you have a security requirement to change the login screen on operating systems that are accessible via a terminal session, it’s a farily easy change on a Celerra or VNX File control station due to the DART OS being based on Linux.  Simply log in as root and edit the /etc/issue file with vi.  It contains the login banner that you see immediately after logging in to a control station.

This is what the default /etc/issue file contains on a VNX control station, and what you’ll see by default when you log in:

A customized version of the Linux operating system is used on the
EMC(R) VNX(TM) Control Station.  The operating system is
copyrighted and licensed pursuant to the GNU General Public License
(“GPL”), a copy of which can be found in the accompanying
documentation.  Please read the GPL carefully, because by using the
Linux operating system on the EMC VNX you agree to the terms
and conditions listed therein.
 
EXCEPT FOR ANY WARRANTIES WHICH MAY BE PROVIDED UNDER THE TERMS AND
CONDITIONS OF THE APPLICABLE WRITTEN AGREEMENTS BETWEEN YOU AND EMC,
THE SOFTWARE PROGRAMS ARE PROVIDED AND LICENSED “AS IS” WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
NOT LIMITED TO, THE IMPLIED MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.  In no event will EMC Corporation be liable to
you or any other person or entity for (a) incidental, indirect,
special, exemplary or consequential damages or (b) any damages
whatsoever resulting from the loss of use, data or profits,
arising out of or in connection with the agreements between you
and EMC, the GPL, or your use of this software, even if advised
of the possibility of such damages.
 
EMC, VNX, Celerra, and CLARiiON are registered trademarks or trademarks of
EMC Corporation in the United States and/or other countries. All
other trademarks used herein are the property of their respective
owners.
 
EMC VNX Control Station Linux release 3.0 (NAS 7.1.71)
 

I don’t work for IBM, but below is an example of what you could change the /etc/issue file to.  I included the hostname of the control station, our company logo, and a legal warning regarding unathorized logins and system monitoring.  Because the standard login banner from EMC includes the DART OS Version on the last line, I suspect that the /etc/issue file is updated along with any OS upgrades or patches, although I can’t confirm that right now.  I’m keeping a backup copy in the same directory to easily change it back after a DART upgrade.

You can also edit the /etc/motd file, which displays immediately after a successful login.  By default the file contains the message “EMC Celerra Control Station Linux” or “EMC VNX Control Station Linux”, along with what I assume to be the date of the original OS installation.

---------------------------------------------------------------
EMC VNX Control Station <Host_Name>
---------------------------------------------------------------
IIIIIIIIIIIIIIIIIII  BBBBBBBBBBBBBBBBB    MMMMMMM     MMMMMMM
IIIIIIIIIIIIIIIIIII  BBBBBBBBBBBBBBBBBB   MMMMMMMM   MMMMMMMM
       IIII          BBBB           BBBB  MMMM MMMM MMMM MMMM
       IIII          BBBB           BBBB  MMMM MMMM MMMM MMMM
       IIII          BBBBBBBBBBBBBBBBB    MMMM  MMMMMMM  MMMM
       IIII          BBBBBBBBBBBBBBBBB    MMMM   MMMMM   MMMM
       IIII          BBBB           BBBB  MMMM    MMM    MMMM
       IIII          BBBB           BBBB  MMMM     M     MMMM
IIIIIIIIIIIIIIIIIII  BBBBBBBBBBBBBBBBBB   MMMM           MMMM
IIIIIIIIIIIIIIIIIII  BBBBBBBBBBBBBBBBB    MMMM           MMMM
---------------------------------------------------------------
Warning: This system is restricted to IBM authorized users for
business purposes only. Unauthorized access or use is
a violation of of company policy and the law.
---------------------------------------------------------------
This system may be monitored for administrative and security
reasons. By logging in, you agree that you have read and
understand this notice.
---------------------------------------------------------------
nasadmin@<Host_Name>'s password:
Follow

Get every new post delivered to your Inbox.

Join 59 other followers