Introduction
This is the development log made for my honors capstone project.
My capstone project was on the topic of a year plus worth of work after the COSI network had to be mostly reconfigured. To get into it more, what happened was that COSI has (hopefully "had" now) some old hardware that was vital to the network. Specifically, the fiber switches (previously F1 and F2). F1 had two power supplies and one already failed before this all happened.
When I was a Junior on my winter break (around Jan 2023) students recieved emails about power outages that kept occuring on campus. A bit later, there was a spam of messages from Discord (I was pinged a lot because I was lab director then) that the network was down. Good way to learn the term "scream test" (credit to Jonathan for teaching me that).
Anyways, the short story of it was that the fiber switch died and there was no connection between COSI and the internet. This point of time was not documented in my development logs originally, but I may still write about it.
Why is there a development log? Well, It was the best way to keep track of all I did as I made this into my capstone project for the Clarkson Honors program. Now I will record it here to be in the appendix of the paper. It could be interesting for someone to read one day. Or myself.
You may have already gotten the vibe of this already, but these are all my pure thoughts. There may be swearing. Have fun looking at this and I will attach my final capstone here as well.
I am not the only person who worked on this, and I appreciate everyone who did. I tried to make sure things get done, but I also tried not to be too pushy, especially after I gave up my lab directorship and other people were becoming more and more willing to participate. I believe a lot more cool stuff could be done, but you will see what held us back in the writing.
Okay, that seems fitting for the intro right now. Can always change it later. Happy reading and if you are a reviewer for my capstone... well... good luck to me then.
General Notes
I noticed there were times when I said that I would elaborate in my documents and then I never did. There are also some very undescriptive logs. To remedy this, I will have a summary of what happened each month, as well as add clarifying notes. Notes in the logs will be in the format (Notes).
January 2023
The start of the capstone. Some things that happened before the logs was figuring out everything was broken. That was fun. Some people, including myself, went up to Clarkson before the Spring semester started to get things less broken. This is talked about more in my actual capstone thesis. We got DNS working, DHCP working, ziltoid OS and Talos OS reinstalled.
I was working on getting the virtual machines (vms) on Hydra running after reinstalling the operating system with two lab members, one of which has never done so before. I had to backup all the vms to Mirror, then put them all back and turn on the ones that were needed (aka should be active). Had some problems in which I had to challenge my assumptions here.
There was some random work here and there, and I was starting to figure out possible problems in the network and what I should do with my capstone.
January 2023 - Logs
Scar — 01/14/2023 11:08 PM
I want to do something with COSI
Scar — 01/21/2023 12:02 AM
Something with the networking
Scar — 01/21/2023 4:43 PM
Picture of working on vm transfer so that hydra's OS can be reinstalled.
Backups on mirror (idea)
Scar — 01/21/2023 5:44 PM
Format for passwords (idea)
Scar — 01/21/2023 7:43 PM
create vlan page on the book (idea)
Scar — 01/21/2023 9:19 PM
Remembering ssh copy ID command
Challenge assumptions: qemu wasn't working out, but I could always try root because I had to remember where I got the backups, plus the user adding to groups (you have to be in the virt-manager group to use virt-manager)
Scar — 01/21/2023 9:33 PM
cannot get into mirror:
tried different computer, checking config, trying to get in through sudo and hydra
going onto a different computer I see that the keys my friend sent were probably from my windows machine
Scar — 01/21/2023 9:50 PM
it was because my keys werent authorized
that sucked
Scar — 01/21/2023 11:35 PM
bottleneck problem, things that go thru private all share a 1gigabit connection
Scar — 01/22/2023 4:14 PM
oh, the pain of no one helping
Scar — 01/22/2023 4:44 PM
adding webhooks to future channel (idea)
Scar — 01/25/2023 6:33 PM
power went out and talks went down
really a lot of physical testing
push in connecitons all the way
why were the original priv and public down?
Scar — 01/26/2023 1:52 PM
http://terminator.cosi.clarkson.edu/login
tiamat, update hydra, vlans, husky for book
February 2023
Kept working on Hydra and go that all set up.
A researcher under Jeanna was having a hard time connecting to Red Dwarf and Prometheus (research servers). We tried connecting them to different switches and to our own machines. Learned a bit about cable testing and general network debugging issues with a network.
COSI started communication with OIT about what was happening with the network and specifically our access to the Internet.
February 2023 - Logs
Scar — 02/06/2023 8:19 PM
Hydra:
Had to redo partitions (add filesystem then mount)
Didn't work cause we were one letter off
Then we created pools with zpools (had to write nothing to all partitions)
Scar — 02/07/2023 1:58 AM
MAKE IT A VISUAL TREE (an idea to demonstrate my capstone)
Have the root be a network, and it's children the most important things for the network (dns, firewall, other) with their own branches (firewall -> ip tables)
It can be an animation/program
Aaah
Scar — 02/09/2023 2:21 PM
someone cant talk to prometheus or red drawf
I couldnt even ping with prometheus.cslabs.clarkson.edu
Scar — 02/10/2023 1:33 PM
went down to COLO and OIT and talked to Joshua and another guy and saw that they have given us the connection (to the internet)
Scar — 02/13/2023 2:04 PM
Red Dwarf and Prometheus were down. Went in to see that it might just be the ethernet connection causing it. I saw that the m3 and m2 switch cable were not connected to the private switch. I connected those and made sure the lights were going up on the switch. Now to try to ping
Scar — 02/13/2023 2:12 PM
Another Destination Host Unreachable
List of strategies:
plug into my computer and use `nmap`` and find ip address
hook up to monitor and "hack" in
Scar — 02/15/2023 1:38 PM
back to red dwarf and m3 switch. Switch doesnt seem connected to the internet either
learned how to test cables, which is usually the issue
Scar — 02/16/2023 1:38 PM
Jonathan helped me find the ips of the switches, I realized it was on zones
another email needing those servers (red dwarf and prometheus since they are research servers)
Scar — 02/16/2023 1:53 PM
I remember what I can do, its the same thing as when I was working on Ziltoid:
When it couldnt connect to the internet, but we needed to use a switch, we used a direct ethernet connection so we
didnt depend on other connections that it didnt have
I can't get into m3 to check the vlans, but if I could just use a direct connection, then I should be able to ping and get onto the GUI in the browser
I will have to try it out
Scar — 02/19/2023 2:19 PM
talking to Jonathan about which vlans on m3 go where
Scar — 02/20/2023 2:11 PM
getting the hardware for "wave 4" (I was previously calling steps in the plan of this capstone waves. Wave 4 is for when all the hardware is in and we move the servers around)
switch is not the right model
got the transceiver modules (these go on the end of cables to plug into our fiber switches)
Scar — 02/20/2023 2:25 PM
noticed book (cosi documentation) changes were never built in the docker
Scar — 02/20/2023 2:40 PM
trying to get hydra vms back up, but the ssh -X hydra virt-manager
(virt-manager) keeps crashing
Scar — 02/20/2023 2:54 PM
IT WAS BECAUSE I WAS BACKSPACING TOO MANY TIMES (I later found a better command to do to what I wanted)
Scar — 02/23/2023 5:16 PM
turns out it was the correct switch (see above)
Scar — 02/24/2023 3:42 PM
got onto switch 3 config via ethernet to computer connection (was almost about to mess up switch 4)
going to check out the vlans and see what is wrong (in the config (If I remember correctly))
Scar — 02/24/2023 4:57 PM
Reorganizing the cables to make the port set up easier
Scar — 02/27/2023 8:09 PM
Jonathan not giving me credit smh (not a big deal. It was in reference to working on the research machines, but he did do the majority of it)
Scar — 02/28/2023 1:43 PM
someone wants to donate to COSI. He said it was an old machine, but still a beef boy. I directed him to our documentation for him to look at
Scar — 02/28/2023 6:09 PM
looking at throughput
I have this idea of an influxdb page for monitoring this type of data
https://www.ittsystems.com/network-throughput/#wbounce-modal
Scar — 02/28/2023 6:36 PM
doesnt seem like we can get an accurate measure of throughput until I am on the network, which the wifi is down rn
March 2023
More ideas for the capstone, like the Discord bot that I ended up doing.
There was also some issues with the firewall that other COSI members took care of. Other issues include ssh problems and the start of issues with purchases. Cool thing though, I met someone who was head of the SCaN division at NASA previously and I showed them around COSI.
There were also issues with making a VM for my SC350 (Software D&D) class, but it was a cool learning experience, especially with making that bridge on my own VM.
March 2023 - Logs
Scar — 03/02/2023 8:48 PM
I wanna make a bot in the cosi server so that when something becomes unpingable it lets people know
and an influxdb site
also, a lot of shit happened with the firewall going down, but I couldnt bring myself to help after I learned of a death in the family (good job to the others)
Scar — 03/07/2023 9:10 PM
organize servers (idea)
Scar — 03/08/2023 1:09 PM
reading for literature review. I will be focused on network configurations and measurement based on the Jones - IT article
cleaned out some of the server room a bit
https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol
https://www.netreo.com/blog/network-telemetry-it-executive-guide/
Scar — 03/08/2023 1:27 PM
https://www.itjones.com/blogs/2022/1/8/network-management-best-practices-for-businesses - my first source for network management
Scar — 03/08/2023 1:39 PM
https://www.dnsstuff.com/network-management
Scar — 03/08/2023 3:35 PM
https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol
Scar — 03/25/2023 9:25 PM
ok, I just had this whole thing with making a vm for sc350 so let me recount (I never elaborated on this. This is briefly mentioned elsewhere in logs. There was two main issues, one was getting the vm connected to the internet, and the other was being able to ssh
into the vm after it did receive internet connection. The first problem was with the brige configuration on hydra. At that point, Hydra was connected to both the private and public vlans, and this vm should only be on the private. The other problem stemmed from the UFW having port 22 for ssh closed)
Scar — 03/27/2023 5:22 PM
ssh outside cosi is broken
Scar — 03/29/2023 12:28 PM
From nasa: how easy is it to go thru once you are in the network
Scar — 03/30/2023 4:50 PM
the switches wont come for another 11 weeks
April 2023
I had more issues with Hydra and my VM. All I wanted to do was ssh into my VM for my class. This led me down learning about syslogs and also UFW being in existance. UFW being the problem because it could clearly connect to the internet to get packages, but I did not know there was another firewall on ubuntu live servers themselves.
I was also doing some organization of the server room.
April 2023 - Logs
Scar — 04/03/2023 4:46 PM
I should reset all the passwords
Scar — 04/03/2023 4:47 PM
Sadly, time got the better of me and I forgot somethings. What needs to happen is there needs to be a bridge (over ethernet ports) in the vm host for the vms to connect so you can ssh
in. You can look at hydra if you need an example.
(Notes to self) You should also be able to ssh into the vm if you are already at the host
create a vlan page on book
Scar — 04/04/2023 1:42 PM
hydra wasnt down before. I probably was the one who broke it a couple of weeks ago
Scar — 04/12/2023 10:59 PM
Couldn't get m3 (management switch) out for the life of me
Tried for over an hour and with two guys for help
Had to cut my loses and just have m4(different management switch) unscrewed so I can make progress
Scar — 04/12/2023 11:17 PM
I put in the switch
I plugged in the switch
I labelled private and plugged that in so it is going thru the private switch
I put in red dwarf as well
I can now ping red dwarf
Scar — 04/12/2023 11:26 PM
ethernet uses rj-45, but is not rj-45
Scar — 04/12/2023 11:41 PM
find usb port, copy name,
screen usb-port 38400
Scar — 04/13/2023 12:19 AM
Didn't work!
Scar — 04/13/2023 6:09 PM
tried to just undo the bridge I made a while ago and it still not working
its not the cable
took out all comments, but that shouldnt work
its so fucking slow
it has a destination host unreachable
result
Scar — 04/13/2023 9:33 PM
Hydra has no route to jesabelle but one to google. It can download updates
need to check if it is getting it (packages) from mirror, I suspect not tho because it could not reach mirror before
I keep getting a connection time out
Scar — 04/13/2023 9:42 PM
ssh hydra
ssh: connect to host 128.153.145.42 port 13699: Connection timed out
before, I went into the nftables for our firewall and saw that there was a hole poked, but for port 13699 on hydra, while hydra was still on the default of port 22
going to man ssh
on hydra shows where the config files are and that is where you can see in the sshd.config
what port is is set at
Scar — 04/13/2023 9:50 PM
I upgraded hydra, it restarted some services
I got `traceroute`` on hydra and it seemed to be able to connect to everything in one hop
I was able to ssh into ziltoid from hydra
lets see, next steps is to try and get to hydra again. if not, this might be a firewall problem afterall
Scar — 04/17/2023 6:37 PM
tail -f /var/log/syslog
(A new comman learned to output the end of syslog and keep it refreshing)
so I looked at hydra again to see why I cant ssh
i was with another lab member: Peter
Scar — 04/17/2023 6:45 PM
he told me about syslog, and apparently tail has a option called -f that means to constantly refresh
we saw there was something called UFW BLOCK that was blocking that port.
UFW means uncontrolled firewall, and every ubuntu server has it and you need to add a hole for ssh into that fire wall. Now I can get in
Scar — 04/18/2023 5:57 PM
I learned from yesterday and when my vm could not be pinged or ping anything I looked at UFW. There I saw it was disabled
Scar — 04/19/2023 1:35 PM
https://www.comparitech.com/net-admin/open-source-network-monitoring-tools/
Scar — 04/23/2023 6:06 PM
today I went to clean up the server room and do some cable management before an important guest arrives
there should be no tripping hazards in the room
mounted m4 in the correct place
we had to unplug ziltoid and talos for a bit to make things cleaner. This left DNS messy
I had to have others help while I made a phone call and got food
but when I got back, it was still partly down. I got on talos and it was just because nsd was down
July 2023
I was on vacation when Talos (the former primary DNS server) was going down. At first we needed someone to physically reboot the server, but after a while that wasn't enough. I investigated via SSH to see that nsd.service used for authoritative DNS was failing on boot. I learned about systemd config files for services on the system in order to get the configuration file for the nsd.service. I was able to change it so that when powered on, Talos will try more times to restart the nsd service. In the process of this, I also learned that crontab, which is a job scheduler for processes on the system.
The problem was the nsd.service was failing on boot with error saying it was "restarting too quickly" through some research of the problem in systemd, turns out that package has files in /etc/systemd/system/multi-user.target.wants in which you can adjust how systemd interacts with the service. Since it was being restarted too quickly, you can find that among all other commands allowed and not there by default, there is StartLimitBurst=x where x is an int that represents how many tries to restart the service, and StartLimitInterval=x where x is an int that sets how long in between each restart attempt.
July 2023 - Logs
Scar — 07/24/2023 4:03 PM
Talos is breaking down over the summer. I hadit rebooted physically then got the service back up
Scar — 07/24/2023 9:49 PM
nsd.service is failing on boot. I learned about systemd options and it's config file for services and crontab
Scar — 07/29/2023 6:24 PM
The problem was the nsd.service was failing on boot with error saying it was "restarting too quickly" through some research of the problem in systemd, turns out that package has files in /etc/systemd/system/multi-user.target.wants
in which you can adjust how systemd interacts with the service. Since it was being restarted too quickly, you can find that among all other commands allowed and not there by default, there is StartLimitBurst=x
where x is an int that represents how many tries there should be to restart the service, and StartLimitInterval=x
where x is an int that sets how long in between each restart attempt.
Scar — 07/29/2023 6:59 PM
There was an error of network unreachable
for some reason before it was fixed, the cause of it was not determined
September 2023
A lot happened this month, and I believe it is because the new fiber switches that are needed for a COSI-COLO connection finally arrived about a week before the start of this new fall semester, about 6 months after COSI put the order in. I invited people to join me in setting up those switches. More people than I expected came and it was hard to fit in the server room and I didn’t know what to do with everyone who showed up.
This was most about interacting with the new switches:
- How to get a connection to them
- How to use RouterOS via GUI and some CLI
- Configuring them to the way COSI needed.
- What servers need what VLANs to operate (mostly about which VLANs were necessary for Hydra right now)
I had some trial and error with the configuration of the new switch. It may have been due to the extra elements in VLAN configuration that I filled out, but now they seem to be unnecessary. Also, there was an error in the firewall for packets to go to this new device. After changing the IPtables used for our firewall, we actually took down more of the network, including our weekly meeting tool: talks.cosi.clarkson.edu. We also made sure that we were using the correct transceivers for each port/connection used.
This is also the start of big hardware issues in the servers (besides the one that got us to this place). Talos could not detect its own memory, and that is why there is a new primary DNS server called TalDOS (I like the name ) and Talos being promoted to a secondary server.
September 2023 - Logs
Scar — 09/15/2023 7:59 PM
getting switches together
harder when people won't react
keep it minimum next time
learned command ip address add 192.168.88.2/24 broadcast + dev enp0s13f0u3
to add ip on a certain connection
Scar — 09/15/2023 10:11 PM
Learned very little, but still something about terminal switch stuff, like add and move directories
I can't make a direct vlan with multiple ports, so I have to make a bridge over the section of ports I want and go from there
Scar — 09/18/2023 5:45 PM
Scar — 09/19/2023 6:47 PM
Yesterday, it the switch config work. Specifically, it wouldnt hit hydra
Learning about routerOS
I tried again today, and placed everything back where it was
this was actually the third time, with yesterday being a bust as well
today, I tried to additionally look at the firewall to see if that is where things are going wrong
I did end up changing the firewall iptables config in order to reflect the new ip address, later realized that the mask was also an issue
It is still not getting up
we tried looking at the modules to make sure that they were spf+ instead of qpf+ (up to 10Gbps connection vs a 40Gbps connection).
no dice, and so we put everything back and talks is not going up
Scar — 09/21/2023 4:44 PM
What happened last night is that I made the config all over again on the other new fiber switch that we had
Everything then worked besides hydra.
Things that were changed from before:
no PVIDs were set
The IP was set to 128.153.145.21 because we the switch last in use (F2) had that ip.
I have a feeling the PVID or something else in the configuration (a small detail) was the only thing wrong
Today I looked at hydra, because that was the only thing that wasnt connected
I remembered that hydra not only was on multiple vlans, but I created a bridge for it a while back in "wave one", so I went to look at it.
Right now it is set on the private vlan. I asked if there was any point on it being on the public and/or other vlans (it was on all of them) and it seemed to be for the vms that COULD run on it.
Since no vms needed that help then, I commented out the bridge in the file in /etc/netplan/ and placed the regular config for a static ip. It now works.
works as in it can ping everything. There is a problem with a vm webservice that is used in COSI. I asked the last person who made and set the vm if there was a command I forgot that launches a vm. Will write command here if I learn about it, or what might be wrong
Scar — 09/28/2023 3:28 PM
Talos went down, memory stick issue
https://www.amazon.com/DDR3-1600MHz-PC3-12800-Unbuffered-Memory-Workstation/dp/B07D3YMPTV/ref=mp_s_a_1_3?crid=3BSZMR8RD97OS&keywords=ddr3%2Bram%2Becc%2Budimm&qid=1695929544&sprefix=ddr3%2Bram%2Becc%2Budimm%2B%2Caps%2C86&sr=8-3&th=1&psc=1
Scar — 09/29/2023 3:10 PM
We decided that hardware is degrading too much. We also don't have anything in the budget for a whole new server
This was all yesterday btw
I named the new server that we got from an alumni "talDOS" to put dhcp and both dns on
this hardware was from 2011. It didnt recognize ubuntu 22.04 because the kernel was too old, but it allowed me to poke
in a grub shell for a bit before I made an ubuntu 20.04 bootable stick
there was no network, at first, but then someone later just switched the port and it was fine. I gave it a new ip address (I originally gave it talos which was still connected, which why it may have been confused)
then I just pointed them (Juno) in the right direction. I have done stuff like this before, and I was busy, and he wanted to learn
today, we are going to colo to finally establish connection
rn, I just took off everything I did to the switch (since this was the first one I used and failed) so that everything is more smooth and can start from scratch
October 2023
This was the month I created the first version of the new topology and got feedback and approval from the lab (or at least those who came to the aftermeeting slot). I talk more about what my goals where in the topology in my formal capstone. There are also plans to address one member's concerns by implementing a KVM over IP.
We have had some big developments this month. My fellow lab director was trying to get a connection to COLO, which broke our network for a weekend. We had to wait for OIT’s help since it was a connection to one of their switches. OIT said the problem was with the transceiver used.
Even after that was fixed, we still could not get a connection to COLO. It is most likely that it is our fiber optic cables that are broken. More will need to be bought. When purchasing these items, I learned that there are different types of fiber optic cables. There is a multi-mode or single-mode type that determines the width of the passage the light can travel through the connector. There is also LC UPC and SC UPC. UPC means there is Ultra-Physical Contact and LC vs SC is another size determination. For COSI, we needed LC UPC to SC UPC, duplex singlemode cables.
October 2023 - Logs
Scar — 10/05/2023 4:46 PM
Creating a new topology
we need to try getting a new fiber cable to colo and see if that is the issue
turns out fiber cables can get damaged pretty easily, which makes sense cause they are made of glass
Try 1
Scar — 10/05/2023 5:31 PM
Cary was working in the server room with others to make that test mirror.
A discord bot revealed that MIRROR was down. I asked if this was on purpose and it was not
There were no lights to mirror AND the internet was down
so we looked and ziltoid and talos were not getting power at all
then they did after Cary flipped some breaker switches
But then I noticed nothing was coming in from OIT
but then packets did come in after a minute
everything is fine now but it seems like there was just a power outage for a second
Scar — 10/13/2023 3:42 PM
A couple things that happened in the last week:
Cary broke the network connection when we were trying to get colo connection running. The problem was with the transceiver
On Wednesday, I presented my topology to the group. I got feedback and answered questions
I did have a very long discussion (that got a bit heated) about the accessibility decline of ziltoid if something bad happened to it.
I was talking to Cary last night and he mentioned something called "KVM over IP" that may allow us to have a kvm to everything in colo with out having to go down there or use an internet connection (but using a direct connection)
Scar — 10/13/2023 3:55 PM
https://www.tomshardware.com/how-to/kvm-over-ip-raspberry-pi
https://www.intel.com/content/www/us/en/business/enterprise-computers/resources/kvm-over-ip.html
Scar — 10/20/2023 2:56 PM
We fucked up a NAS by taking a hard drive out. It was in raid. If it is in raid one, thats a bit better
https://www.prepressure.com/library/technology/raid#raid-0
learn about mdadm
which can check for raid configurations iff (if and only if) that was the packaged used to make the raid configurations in the first place
Cary just would look using `lsblk``
https://www.cyberciti.biz/faq/how-to-check-raid-configuration-in-linux/
https://raid.wiki.kernel.org/index.php/A_guide_to_mdadm
November 2023
Not much happened here besides getting the fiber optic cables and there was now a connection between COSI and COLO. It was not turned on at this point.
November 2023 - Logs
Scar — 11/28/2023 5:39 PM
Monday before break 11/20 we finally got our fiber cables in the mail.
So, we had OIT look at everything for the switch. They switched out the transeiver and so that was good, so we deduced it was a fiber cable issue
We got the cables, put one in cosi and the other in colo and we passed the blinky light test.
Cary even connected directly to bacon and was able to ping google
yay!
January 2024
Getting right back into the school year, it has been a long time since someone regenerated the passwords for our devices. One of the new switches has a password of “password” for convenience and was never switched to something else. It would be better to change most of those that have been used for many years. Also, the lockbox that the lab directors are able to unlock that hosts these root passwords is very outdated and messy. I created a new password template with an emphasis on marking the date it was made, so that there would be no mistake on which information is the most modern.
I created a plan on how a server moving day (now being named THE GREAT COSI MOVE OF 2024) could be organized. It involves two teams, with one at COLO and one at COSI. Presented it to interested members after a lab meeting. We also prepared for THE GREAT COSI MOVE OF 2024 by cleaning out COLO entirely.
I learned how to create a Discord bot for pinging our services to find if anything went down. It was having many problems with spamming the lab members overnight, but it was able to detect Hydra being unreachable. More improvements can be made. At least I have something for the management that will be useful for lab members.
January 2024 - Logs
Scar — 01/18/2024 3:05 PM
Yesterday I presented my standards for password documentation
And I made a schedule for when the topology changes
Scar — 01/22/2024 1:48 PM
Last friday I made a bot that simply pings each server and alerts when something cannot be pinged
Scar — 01/31/2024 11:03 AM
I forgot to say this: On January 19th, Cary, Juno, and I took a quick trip to COLO to clean everything out
The problem right now is that we are waiting for universal rails for COLO, which also means that we are waiting to know who to send purchase orders too since the original person moved jobs and the next person quit. Maciel is looking into that for us
Scar — 01/31/2024 11:05 AM
Some ideas for the ping bot are to have it slow down over night so that it does not spam everyone when they wake up in the morning (even though it doesn't ping anyone)
Scar — 01/31/2024 7:18 PM
We got new racks donated by OIT. We also got universal rails set up on mirror
Scar — 01/31/2024 8:53 PM
(ideas for the written capstone)
Talk about the challenges that happened with the university
Talk about the re-establishing of COSI culture and the lack of knowledge that passed down because of covid
February 2024
I kept working on my "ping bot", but it had issues, mostly stemming from the fact that my connection down at my woodstock (at the very bottom of the hill) is bad, even when wired. I also spent time on improving it a bit.
The COSI server room got a new rack and universal rails donated by OIT. Which is great for THE GREAT COSI MOVE OF 2024.
Working with OIT, I and one lab director (I gave it up for my last semester here) went down to COLO where we tested the uplink to the Hill campus. At first there was a problem with the transceiver with OIT’s end, but an uplink has now been installed and the COSI move can proceed.
THE GREAT COSI MOVE OF 2024 was put into full swing with seven people there to help out. We divided into two teams to move things around. One on the hill and one in COLO. After movement in COLO, there were some issues with connectivity, but they turned out to be minor. TalDOS did not boot correct (cause it is so old) and some cables were in the wrong ports. That is not too bad.
The movement of servers in the server room was done on the break weekend Clarkson had. With all that done, THE GREAT COSI MOVE OF 2024 was complete!
February 2024 - Logs
Scar — 02/01/2024 1:05 PM
adjustments to the ping bot:
specific channel
say that an ip was pinged
less over night
channel = client.get_channel(12324234183172)
await channel.send('hello')
https://discordpy.readthedocs.io/en/stable/faq.html#how-do-i-send-a-message-to-a-specific-channel
128.153.221.225
Scar — 02/02/2024 1:13 AM
My bot is spamming
I was able to get it into a certain channel
I'll move it to my own and see why it doesnt ping ip addresses
Scar — 02/08/2024 1:30 PM
There are still problems with the purchasing for COSI. Specifically, COSI wants to get some universal rails in order to mount everything, but we don't know who can even put in the orders
the new lab directors are going to email OIT and get internet access moved to COLO. We want to keep the momentum of those who want to make the big move
Scar — 02/12/2024 4:19 PM
(feedback for the capstone paper)
Add a paragraph about COSI
Talk a bit about the surge and talk more about what I did and why I choose to do that
Make an executive summary
talk about the ping bot
in challenges, show how I had to be more flexible
give myself more credit, this is my "legacy"
explain more on topology and say how the power outages recently did not stop us
Scar — 02/14/2024 4:29 PM
Using mdbook for the first time for my logs
Scar — 02/15/2024 3:57 PM
Orange-blue direct link to 144 net
Scar — 02/15/2024 4:07 PM
Green brown sc
Scar — 02/15/2024 4:41 PM
So, what was said above was for documentation purposes. it shows which link goes where, either to the 144 net (and the internet) or to the hill
Juno was at OIT help desk when they told him that COSI got the link. He asked everyone in cosi at the time if they wanted to go, and only he and I went.
We got down there and we were having trouble because even with a new transceiver and fiber cable, nothing was working. No blinky lights, so Juno called OIT.
They came down there, and we debugged. It was OIT's transceiver that was broken. Oh well
we got blinky lights and everything is ready for the great move besides getting rails
which, by the way, we are trying to figure out the best time, but havent found it yet. Especially since my capstone is due march 1st
Also, I got mdbook in a container on tiamat to host my logs online.
there was some diffculty with the working directory in the Dockerfile, but I stole the one from the cosi book and it worked. I could not recreate the error to learn from it
Scar — 02/16/2024 9:33 PM
too much rebooting on taldos when I was trying to test my pingbot (it freaked out on boot and someone had to manually turn it back on)
Scar — 02/18/2024 5:43 PM
We unplugged our internet link in the server room and went down to colo to plug in the net link there.
We got blinky lights on both the uplink to sc and from the connection oit provided to us, and we were able to confirm mirror was up, that means we are ready to move
Scar — 02/20/2024 5:28 PM
The great cosi move of 2024 has started
7 people are helping
We took out mirror, kasper, and taldos. We noticed that cables were mislabelled
I personally am going to move things to colo
Scar — 02/20/2024 6:45 PM
Going smoothly till we couldnt ping mirror
"Copper" appearently means ethernet
Cat8 is gold plated
Scar — 02/20/2024 9:30 PM
Here is the summary of what went down now that the move is over for tonight
The colo team got mirror, a kvm, taldos and kasper in physically. (with the right connections?) We knew we needed to make sure things are connected, and to make sure the switch->kasper fiber cables were in the right ports for the bridge
There was some issue where mirror can be pinged but there was no ssh connection from Cary's laptop
I did not bring my laptop, so I did not remember the password (for the switch), and when we did get it, it did not seem to work for some reason
we tried the management port, switching the fibers in kasper, the different vlans, and there was difficulty connecting to the switch
My team then came back from colo. The people in the server room reorganized cables, but did not reorganize the servers
When Juno went down there for the second time, it seemed that taldos made a mistake on boot because ITS SUPER OLD
now there may be issues with kasper firewalls, and what can be pinged changes depending if the firewall is on or off, and colo might not recongize ipv6
Scar — 02/20/2024 9:37 PM
what needs to be done is:
- get mirror, taldos, and kasper in ship-shop shape
- get the other servers in the servery moved around to our liking (it wont be the same as on the diagram because of issues of square vs circle holds for the racks)
- get the ping bot down there with a server and possibly that kvm over ip
we will have to schedule another day to continue
but then again, the work in cosi should never really ever be done
Scar — 02/21/2024 1:28 PM
Juno went down to colo. They switched the fibers on kasper on the config and rebooted taldos so that it didn't get stuck. Everything is good now
Scar — 02/22/2024 2:28 PM
Took out ziltoid
Scar — 02/25/2024 10:19 PM
I didn't get a chance to put this all down in writing. This all happened on the 22nd and 23rd
It's a break weekend and I wanted to finish moving around the servers to complete THE GREAT COSI MOVE OF 2024
Going to complain a bit here, the person who said they would help me (because some of those servers are heavy) didn't show up on the first day. My other friend showed up later and they helped out. When I was alone, I had to resort to stacking books to help me hold things in place (reminds me of January 2023 and I had to put Talos back into the rack alone)
over the two days, I (a good amount by myself, but I should say and the friend on the first day, and the late person on the second day) moved the servers around
the research machines were on the bottom. The student machines were on the top.
The big thing to learn are the different types of rails, and also how different racks have different holds that can be used with different rails. ie. square holds/holes can be used with one type of rail, and there were two other holds that were both circles, but of different sizes. That is something to look for when moving things, because this did switch the initial plan
Scar — 02/25/2024 10:26 PM
today, I just got the docker for my website to host this all on built in docker and I pulled it on zones so that it would be on taldos. I would go on terminator and make it real, but I do not have the password on me and I have way to little energy to find it.
Scar — 02/26/2024 11:06 AM
going though my capstone paper, I learned that conditioned power means to supply a proper voltage (as well as other characteristics)
Scar — Today at 12:09 AM
Heck yeah I got a fist bump from Jeanna
March 2024
March 2024 - Logs
Scar — 03/06/2024 10:42 PM
got ziltoid OS reinstalled. Teached someone how to do that, but I will probably reinstall it with live server since I thought teaching them how to use desktop would be less scary and more likely for them to use
Myself, Juno, and Amity did some book documentation
Scar — 03/14/2024 10:42 PM
I walked ziltoid down to COLO. Luckily one COSI alum I knew was walking around as well so it was nice to have a second pair of hands. Ziltoid was updated with Ubuntu Live Server for the ping bot. We got it mounted and I forgot to get a transceiver and I brought the wrong power cable
Scar — 03/15/2024 10:42 PM
I went down again before the spring break with Juno. We just wanted to plut ziltoid in so that I can work on it over the break, but it turns out the transciever we had is broken, so it will have to be connected when we get back to clarkson
Oh! I also wanted to get the units so that I can write it on book.
April 2024
Things here got slower because it was starting to become crunch time at Clarkson. Now it is just working with new hardware as we get it.
April 2024 - Logs
Scar — 04/10/2024 7:36 PM
Yesterday we got two new servers. (3 new servers in total)
We got taltres up, and we got to put dhcp and dns on it
Also, today, since we are putting more servers down to colo, I made a slight change to the current topology
Capstone Paper
Making this better will be figured out later (hopefully). Final Capstone Paper
Contact Info
Clarkson: carlonsm@clarkson.edu
Personal: sophiamcar@gmail.com
Acknowledgements
I would like to thank to following people for their help during this capstone:
- Jeanna Matthews: Capstone advisor, lab faculty member, a very nice person
- Alexis Maciel: Chair of CS that had to deal with my shenanigans and COSI people raiding his office
- Madison Mahady: Fought for the COSI purchases
- Jonathan Nordby: Fellow lab director and TI-basic dude
- Sydney DeCyllis: Fellow lab director and always adding extra middle names
- Cary Keecsler: Fellow lab director (last one I promise)
- Juno Meifert: Lab director (told ya ^^)
- Peter Lef: Who said I should only say negative things about him.
- Mom: You dealt with me
- Dad: Thanks for squeezing into the server room