Administrators Guide

From ACAL_Simpool
Jump to: navigation, search

Contents

Simpool Setup

TODO:

  • networking
  • master node selection
  • dnsmasq
  • condor setup
  • nfs root image
  • hostname assignment
  • node setup pxe boot
  • node formatting
  • glusterfs setup

Handling Users

Adding Users to Simpool

1) Create a new home directory

$ useradd -m <uniqname>

The above creates the users home directory. Always use their UofM uniqname.

2) Add their public key to their authorized keys file

$ cd /home/<uniqname>/.ssh
$ cat <pulic_key_they_sent> >> ./.ssh/authorized_keys

The new user should send the admin a public key created from the computer which they intend to login to simpool.

3) Change the ownership of their home directory to them

$ chgrp -R <uniqname> .ssh/
$ chown -R <uniqname> .ssh/

Adding Users to Nodes

1) Create their own public key on simpool

$ ssh-keygen -t rsa

This will create id_rsa.pub in their /home/<uniqname>/.ssh/ folder.

2) Ask them to copy their public key to the authorized_keys file in the same folder.

$ cd /home/<uniqname>/.ssh
$ cat id_rsa.pub >> authorized_keys

3) Copy the lines corresponding to the user from the following files

/etc/passwd /nfsroot/raring_64/etc/passwd
/etc/shadow /nfsroot/raring_64/etc/shadow
/etc/group /nfsroot/raring_64/etc/group

In /etc/shadow, make sure there is a * between the first and second colon.

example: sabeyrat:*:16456:0:99999:7:::

In /etc/passwd, make sure the shell is set to /bin/bash.

Deleting Users from Simpool & Nodes

1) Remove their home directory

$ userdel <uniqname> from headnode

2) Remove entries from image

$ cd /nfsroot/raring_64/etc

Delete the corresponding lines from passwd, shadow, and group files.

Performing Administrator Tasks

To perform admin tasks, the administrator must be “root”. Administrators must login to simpool directly as root.

$ ssh root@simpool.eecs.umich.edu

Adding/Removing Admins in Simpool

Giving a user admin status

$ useradd <uniqname> sudo

In /etc/shadow, make sure there is a * between the first and second colon.

example: sabeyrat:*:16456:0:99999:7:::

Removing admin status

$ userdel <uniqname> sudo

Updating and Upgrading

Updating the simpool

$ apt-get update
$ apt-get upgrade

Don’t upgrade the kernel!

Updating the nodes

$ cd /nfsroot/raring_64
$ chroot .
$ apt-get update
$ apt-get upgrade

Reserving Nodes for a Particular User

Log into the node to be reserved, eg. m60-004

$ ssh m60-004
$ cd /etc/condor/config.d
$ vi 00debconf
# allow Condor jobs to run with the same priority as any other machine activity
# always start jobs once they are submitted
#START = TRUE
START = ( ( User == "bcoh@simpool" ) )

or

START = FALSE # allow a single user to manually login to the node and run jobs there. Condor willl never schedule any jobs to this node.

Simpool Machines Overview

Machine Names Machine Type Sockets/Cores/Procs Processor
m50-001, m50-002, m50-003, m50-004 4 processors
(2 sockets,
2 cores/socket)
Dual Core AMD Opteron 280
m52-001, m52-002, m52-003, m52-004 Dell towers 4 processors
(1 socket,
4 cores/socket)
Core2 Quad Q6600 @ 2.40GHz
m53-001, m53-002, m53-003, m53-004, m53-005, m53-006, m53-007, m53-008, m53-009, m53-010 SunFire XX2200 8 processors
(2 sockets,
4 cores/socket)
8GB DRAM
Quad-Core AMD Opteron 2376
m55-001, m55-002, m55-003, m55-004, m55-005, m55-006 Dell PowerEdge 2950 8 processors
(2 sockets,
4 cores/socket)
Xeon E5430 @ 2.66GHz,“Harpertown”, 45nm, Released 11/11/2007
m60-001, m60-002, m60-003, m60-004, m60-005, m60-006, m60-007, m60-008, m60-009, m60-010, m60-011, simpool Supermicro, X8STi 12 processors
(1 socket,
6 cores/socket w/ SMT)
24GB DRAM
Xeon X5670 @ 2.93GHz, “Westmere-EP”, 32nm, Released 03/16/2010

M60’s have one PCI slot, one x8 PCIe slot, and one x16 PCIe slot.

Slide Rails for Dell PowerEdge 2950 can be ordered at: http://accessories.us.dell.com/sna/productdetail.aspx?c=us&l=en&s=bsd&cs=04&sku=A0741949

Other Tasks

Attaching an External Storage Device

1) Plug in the device to the "simpool" node

2) Find out the /dev path of the device. In this case, it is sdb1

$ dmesg
[2180219.821005] usb 1-2: new high-speed USB device number 2 using ehci-pci
[2180219.954416] usb 1-2: New USB device found, idVendor=1058, idProduct=1021
[2180219.954421] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[2180219.954424] usb 1-2: Product: Ext HDD 1021
[2180219.954427] usb 1-2: Manufacturer: Western Digital
[2180219.954430] usb 1-2: SerialNumber: 574341575A30333935313930
[2180219.955119] scsi5 : usb-storage 1-2:1.0
[2180220.955116] scsi 5:0:0:0: Direct-Access     WD       Ext HDD 1021     2021 PQ: 0 ANSI: 4
[2180220.956152] sd 5:0:0:0: Attached scsi generic sg2 type 0
[2180220.957179] sd 5:0:0:0: [sdb] 732566016 4096-byte logical blocks: (3.00 TB/2.72 TiB)
[2180220.959157] sd 5:0:0:0: [sdb] Write Protect is off
[2180220.959163] sd 5:0:0:0: [sdb] Mode Sense: 17 00 10 08
[2180220.961148] sd 5:0:0:0: [sdb] No Caching mode page present
[2180220.961478] sd 5:0:0:0: [sdb] Assuming drive cache: write through
[2180220.962919] sd 5:0:0:0: [sdb] 732566016 4096-byte logical blocks: (3.00 TB/2.72 TiB)
[2180220.966970] sd 5:0:0:0: [sdb] No Caching mode page present
[2180220.967295] sd 5:0:0:0: [sdb] Assuming drive cache: write through
[2180221.028708]  sdb: sdb1
[2180221.030129] sd 5:0:0:0: [sdb] 732566016 4096-byte logical blocks: (3.00 TB/2.72 TiB)
[2180221.034917] sd 5:0:0:0: [sdb] No Caching mode page present
[2180221.035243] sd 5:0:0:0: [sdb] Assuming drive cache: write through
[2180221.035599] sd 5:0:0:0: [sdb] Attached SCSI disk
[2180867.603299] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)

2a) Another way to find out is using lsblk. You can verify the device by cross checking its size.

$ lsblk
NAME                          MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                             8:0    0 465.8G  0 disk 
├─sda1                          8:1    0   243M  0 part /boot
├─sda2                          8:2    0     1K  0 part 
└─sda5                          8:5    0 465.5G  0 part 
  ├─simpool--vg-root (dm-0)   252:0    0 441.5G  0 lvm  /
  └─simpool--vg-swap_1 (dm-1) 252:1    0    24G  0 lvm  [SWAP]
sdb                             8:16   0   2.7T  0 disk 
└─sdb1                          8:17   0   2.7T  0 part /home/sabeyrat/extern_hdd
sr0                            11:0    1  1024M  0 rom  

3) Manually mount the device to a desired location. Only root user can perform this task.

$ mkdir ~/usbdevice
$ mount /dev/sdb1 ~/usbdevice

4) To give write permission to the device to some user

$ chown sabeyrat -R ~/usbdevice

5) To unmount the device there are two options

$ umount /dev/sdb1
or
$ umount ~/usbdevice

Simpool Setup

In order to connect to the Simpool cluster, login to the simpool login node by typing:

$ ssh <uniqname>@simpool.eecs.umich.edu>

All of the machines are connected via an internal network and their IP addresses are assigned between 10.11.0.3 and 10.11.0.255. The address 10.11.0.1 belongs to simpool. The list of all client nodes can be found in the /etc/dnsmasq.conf file. This file maps the MAC addresses of all the clients to their names: m50-001, m50-002, m50-003, etc.

PXE Boot

All of the clients boot over the network from an image stored on the simpool node. Their BIOS settings are already configured to boot this way. They will automatically look for this image.

The folder on the simpool that contains the client’s files are /nfsroot. Files that the client will look for during boot are:

/nfsroot/tftpboot/initrd.img-3.8.0-19-generic 
/nfsroot/tftpboot/vmlinuz-3.8.0-19-generic

The client will know to look for these files because it is specified in /nfsroot/tftpboot/pxelinux.cfg/default.

root@simpool:~# cat /nfsroot/tftpboot/pxelinux.cfg/default
DEFAULT menu.c32
PROMPT 0
TIMEOUT 1
LABEL linux
	KERNEL vmlinuz-3.8.0-19-generic
	APPEND ramdisk_size=65536 initrd.img-3.8.0-19-generic root=/dev/ram0 ip=dhcp

Upgrading the Kernel

Proceed with caution. Previous attempts to upgrade the kernel has been unsuccessful as it has brought the entire cluster.

In order for everything to function properly, the kernel on the simpool node must match the kernel on the remaining machines.

A new image can be generated using initramfs tools in /etc/initramfs-tools. The two important files in this folder are:

  • initramfs.conf
  • modules

There are two versions of these two files: one version for the simpool node, and the other version for the remaining machines. Those files are backed up as:

  • initramfs.conf.simpool
  • modules.simpool
  • initramfs.conf.nodes
  • modules.nodes

To generate a new image, type:

$ mkinitramfs -o /path/to/output/file/initrd.img-`uname -r`
For example,
$ mkinitramfs -o /nfsroot/tftpboot/initrd.img-`uname -r`

This will make the file initrd.img-3.8.0-19-generic inside /nfsroot/tftpboot. Make sure this name matches to the file in /boot. The file in /boot is different, but should have the same name in order for simpool and nodes to match. Once you are done, make sure the original initramfs.conf and modules files reflect the simpool configuration and not the node configuration.

Make sure the simpool files read

root@simpool:/etc/initramfs-tools# cat initramfs.conf
#
# initramfs.conf
# Configuration file for mkinitramfs(8). See initramfs.conf(5).
...
BOOT=local
...
DEVICE=
...
NFSROOT=auto

and that the node file reads

root@simpool:/etc/initramfs-tools# cat initramfs.conf.nodes 
#
# initramfs.conf
# Configuration file for mkinitramfs(8). See initramfs.conf(5).
#
...
BOOT=nfs
...
DEVICE=eth0
...
NFSROOT=10.11.0.1:/nfsroot/raring_64

???

Next,probably need to copy over the vmlinuz file from /boot to /nfsroot/tftpboot. Then update the /nfsroot/tftpboot/pxelinux.cfg/default file.

root@simpool:~# cat /nfsroot/tftpboot/pxelinux.cfg/default 
DEFAULT menu.c32
PROMPT 4
TIMEOUT 10
LABEL Linux
	KERNEL vmlinuz-3.8.0-19-generic
	APPEND ramdisk_size=65536 initrd=initrd.img-3.8.0-19-generic root=/dev/ram0 ip=dhcp

Then restart all the nodes. Then restart the simpool node.

???

GlusterFS

We have built the distributed file system across multiple server nodes using GlusterFS. Our Gluster volume type is of type “Distribute only”, not “replicate.” Typing “$ gluster volume info” will give you this information. A disk or server failure in distributed volumes can result in a loss of data because directory contents are spread randomly across the bricks.

Gluster terminology:

  • glusterd = The management daemon that needs to run on all servers in the storage pool.
  • brik = The basic unit of storage, an export directory on a server in the storage pool.
  • volume = The logical collection of bricks.
  • client = The machine that mounts the volume. Usually an external machine, but can be one of the servers. In our case, the client is the simpool.
  • server = The machine which hosts the actual file system in which the data will be stored.

Gluster can be installed in a similar way to the following steps (Assuming Ubuntu 14.04 LTS):

$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:gluster/glusterfs-3.5
$ sudo apt-get update
$ sudo apt-get install glusterfs-server

Tip: to check which packages have been installed, run:

$ dpkg --get-selections | grep glusterfs

The hard disk in every machine (e.g. /dev/sda) was partitioned as follows:

root@m53-008:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 232.9G  0 disk 
|-sda1   8:1    0  15.6G  0 part [SWAP]
|-sda2   8:2    0     1K  0 part 
|-sda5   8:5    0   957M  0 part 
|-sda6   8:6    0   957M  0 part /var/lib/glusterd
|-sda7   8:7    0  18.6G  0 part /tmp
`-sda8   8:8    0 196.8G  0 part /gluster/brick

The “/gluster/brick” partition is where the portion of the distributed file system is stored on that machine. A record of the partition was made in /etc/fstab.

$ ssh m60-001
$ cat /etc/fstab
/dev/sda1 none swap sw 0 0
/dev/sda7 /tmp ext3 defaults 0 2
/dev/sda6 /var/lib/glusterd/ ext3 defaults 0 0
/dev/sda8 /gluster/brick xfs defaults 0 0

The distributed volume was created by following these steps:

1. Create a trusted storage pool
2. Create the volume (“gluster volume create …”)
3. Start the volume (“gluster volume start …”)
4. Mount the volume

Some helpful commands with glusterfs are:

$ glutser volume info
$ gluster peer status

When the volume is created, the status reads “Created.” Our Gluster volume type is “Distribute” only, not “replicate.” To start the volume type: “gluster volume start <volume_name>”, then the status will read “Started.”

Volume Name: home
Type: Distribute
Volume ID: e7456921-9fe2-4617-8b17-8b0142891f75
Status: Started
Number of Bricks: 19
Transport-type: tcp
...

Mounting the filesystem: If the /home/ directory on simpool is empty, then the gluster volume has not been mounted. It can be mounted manually with one of the following commands:

$ mount -m glusterfs simpool:/home /home
or
$ mount -t glusterfs simpool:/home /home

Unmounting the filesystem: To stop a volume mounted at /home, type:

$ umount /home
$ gluster volume stop home


Disk/server failure in distributed volumes can result in a loss of data because directory contents are spread randomly across the bricks.

Troubleshooting

Issue 1: Can’t login to simpool (unless as root) and /home/ is empty.

Solution: Manually mount the home directory.

$ mount -m glusterfs simpool:/home /home
or
$ mount -t glusterfs simpool:/home /home

Issue 2: Gets the error “ssh: connect to host m50-002 port 22: No route to host

root@simpool:~# ssh -v m50-002
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to m50-002 [10.11.0.145] port 22.
debug1: connect to address 10.11.0.145 port 22: No route to host
ssh: connect to host m50-002 port 22: No route to host

Solution: Check to see if the machine is powered on. Then, check to see if the ethernet cable is connected to the network switch and the node.

Issue 3: Gets the error “ssh: Could not resolve hostname m50-001: Name or service not known

root@simpool:~# ssh -v m50-001
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
ssh: Could not resolve hostname m50-001: Name or service not known

When the node has been offline for a long time, the assigned IP address is forgotten.

Issue 4: "Transport endpoint not connected" error. The glusterfs distributed filesystem has become unfunctional due to a node failure (in this case m53-010). The following log shows how I resolved this problem.

root@simpool:~# gluster volume info
 
Volume Name: home
Type: Distribute
Volume ID: e7456921-9fe2-4617-8b17-8b0142891f75
Status: Started
Number of Bricks: 19
Transport-type: tcp
Bricks:
Brick1: m60-001:/gluster/brick
Brick2: m60-002:/gluster/brick
Brick3: m60-004:/gluster/brick
Brick4: m60-005:/gluster/brick
Brick5: m60-006:/gluster/brick
Brick6: m60-007:/gluster/brick
Brick7: m60-008:/gluster/brick
Brick8: m60-009:/gluster/brick
Brick9: m60-010:/gluster/brick
Brick10: m60-011:/gluster/brick
Brick11: m53-001:/gluster/brick
Brick12: m53-002:/gluster/brick 
Brick13: m53-004:/gluster/brick
Brick14: m53-005:/gluster/brick
Brick15: m53-006:/gluster/brick
Brick16: m53-007:/gluster/brick
Brick17: m53-008:/gluster/brick
Brick18: m53-009:/gluster/brick 
Brick19: m53-010:/gluster/brick
Options Reconfigured:
auth.allow: 10.11.0.*
performance.cache-max-file-size: 6GB
nfs.disable: on

root@simpool:~# gluster volume remove-brick home m53-010:/gluster/brick start
volume remove-brick start: success
ID: ee739352-baf4-4b63-8a72-cb4451151053

root@simpool:~# gluster volume remove-brick home m53-010:/gluster/brick status
                                    Node Rebalanced-files          size       scanned      failures       
skipped         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   ------
-----   ------------   --------------
                               localhost                0        0Bytes             0             0    not 
started             0.00
                                 m53-008                0        0Bytes             0             0    not 
started             0.00
                                 m53-008                0        0Bytes             0             0    not 
started             0.00
                                 m60-009                0        0Bytes             0             0    not 
started             0.00
                                 m60-001                0        0Bytes             0             0    not 
started             0.00
                                 m60-010                0        0Bytes             0             0    not 
started             0.00
                                 m60-008                0        0Bytes             0             0    not 
started             0.00
                                 m53-004                0        0Bytes             0             0    not 
started             0.00
                                 m53-002                0        0Bytes             0             0    not 
started             0.00
                                 m53-001                0        0Bytes             0             0    not 
started             0.00
                                 m53-005                0        0Bytes             0             0    not 
started             0.00
                                 m53-005                0        0Bytes             0             0    not 
started             0.00
                                 m60-004                0        0Bytes             0             0    not 
started             0.00
                                 m60-007                0        0Bytes             0             0    not 
started             0.00
                                 m60-002                0        0Bytes             0             0    not 
started             0.00
                                 m60-005                0        0Bytes             0             0    not 
started             0.00
                                 m53-006                0        0Bytes             0             0    not 
started             0.00
                                 m60-011                0        0Bytes             0             0    not 
started             0.00
                                 m53-009                0        0Bytes             0             0    not 
started             0.00
                                 m60-006                0        0Bytes             0             0    not 
started             0.00
                                 m53-007                0        0Bytes             0             0    not 
started             0.00

root@simpool:~# gluster volume remove-brick home m53-010:/gluster/brick commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success

root@simpool:~# gluster volume info
 
Volume Name: home
Type: Distribute
Volume ID: e7456921-9fe2-4617-8b17-8b0142891f75
Status: Started
Number of Bricks: 18
Transport-type: tcp
Bricks:
Brick1: m60-001:/gluster/brick
Brick2: m60-002:/gluster/brick
Brick3: m60-004:/gluster/brick
Brick4: m60-005:/gluster/brick
Brick5: m60-006:/gluster/brick
Brick6: m60-007:/gluster/brick
Brick7: m60-008:/gluster/brick
Brick8: m60-009:/gluster/brick
Brick9: m60-010:/gluster/brick
Brick10: m60-011:/gluster/brick
Brick11: m53-001:/gluster/brick
Brick12: m53-002:/gluster/brick
Brick13: m53-004:/gluster/brick
Brick14: m53-005:/gluster/brick
Brick15: m53-006:/gluster/brick
Brick16: m53-007:/gluster/brick
Brick17: m53-008:/gluster/brick
Brick18: m53-009:/gluster/brick
Options Reconfigured:
auth.allow: 10.11.0.*
performance.cache-max-file-size: 6GB
nfs.disable: on

At this point, the “Transport endpoint not connected” error went away. And I could “rm -rf” directories again! The number of bricks has been reduced from 19 down to 18.

I removed the hard disk from the failed m53-010 node and tried to insert it into m53-006 node and mount it from there. However, this attempt (recorded below) was unsuccessful.

root@m53-006:~# gluster volume add-brick home m53-006:/gluster/brick_m53_010
volume add-brick: failed: The brick m53-006:/gluster/brick_m53_010 is a mount point. Please create a sub-
directory under the mount point and use that as the brick directory. Or use 'force' at the end of the 
command if you want to override this behavior.
root@m53-006:~# gluster volume add-brick home m53-006:/gluster/brick_m53_010 force
volume add-brick: failed: /gluster/brick_m53_010 or a prefix of it is already part of a volume

Useful links:

Issue 5: Condor is not working. Simpool status page is empty, condor_status returns an error, and ps aux | grep condor shows condor is not running.

Solution: type

$ /usr/sbin/condor_master

This restarts the HTCondor master, whose only job in life is to make sure the other HTCondor daemons are running. The master keeps track of the daemons, restarts them if they crash, and periodically checks to see if you have installed new binaries (and if so, restarts the affected daemons). Source: https://research.cs.wisc.edu/htcondor/manual/v8.2/3_2Installation_Start.html

Personal tools
Namespaces

Variants
Actions
Navigation
Tools