A Walk through beautiful Vienna
District 1: Systems Design (Senacor Office)
- Design Steps:
Senacor Office- Scope the problem: build what interviewer wants: what, who, where, whenJapanese Embassy- Make reasonable assumptionsVotivkirche- Draw major componentsFancy Spar- Identify key issues (bottlenecks)U5 Site- Redesign key issuesTown Hall- Concepts: Horizontal scaling: adding additional resourcesBurgtheater- Vertical scaling: propping up existing resources, easierParliament Cupola- Load Balancing: distribute server load so it doesn’t crash by cloningLaw Palace- Denormalization: adding redundant info to speed up reads (because joints are slow)VOGA Club- NoSQL: does not support jointsParliament- Sharding: splitting data across multiple machinesVolksgarten- vertical: by function, e.g. one table for profiles, one for messagesHofburg- key-based: some part of data, m mod N (N: number of servers)Kanzleramt- directory-based: maintain a look-up tableHofburg- Caching: provide simple key-value pairing (query & result) between application, and data storeHofreiterschule- Async processing: for slow operations (like top forum posts, loading DNN)Raiba @ Graben- Network metrics:Full Square- bandwidth: max amount of data in tReal Square- latency: time from start to endTime to Raiba- throughput: actually amount of data in tMeinl- Map/reduce: map takes data & returns key: value, reduce takes key:value and reduces it e.g., key: sum(values)Brunnen- Considerations: Failure: plan for test casesGreen Habsburghouse- Availability: % system is operationalDo & Co- Reliability: % that systems is operationalStephansdom- Read-heavy: consider cachingTower queue of Dom- Write-heavy: consider queuing writesSecurity Peeps at Dom- Security: plan for test cases
District 2: Object-Oriented Design (Salztor-Bridge)
Salztorbruecke- Approach: Handle ambiguity: who is going to use system & how, when, what, where, whyRLB- Define core objects: mostly nounseforce- Analyze relationships: 1-to-1, 1-to-many, many-to-manyIBM- Identify actions: mostly verbs of objectUniqa- Design Patterns: Singleton: ensures that a class has only one instance (e.g. restaurant)Sternwarte- Factory: offers interface for creating instance of a class with subclass deciding which class to instantiate (e.g. pipeline)
District 2: API Calls (RDG)
coffee- Get: retrieve resourcemy office- Post: create new resourceSusanne- Put: edit / updateDietmar- Delete: remove resourceHarald & Seb- Object orientation: Difference of state & functionMarcus Presich- Everything is an object in Python
District 3: Sorting & Searching Algorithms (Stadtpark)
Enter- Bubble Sort: swap element “previous” and “current” if current < previous, continueGo to Statue- Selection Sort: find smallest element, and swap with first elementLeave to U4- Merge Sort: divide arrays in half, sort halves, merge backGerman Music Hall- Quick Sort: pick “random” element, divide array so that smaller numbers left “random”, and great numbers rightTrump Hotel- Radix Sort: for ints, group digits of numbersIce-Skating Rink- BST: in sorted array, compare x to mid, left if lower, right if higher
District 3: Stacks and Queues (Landstrasse )
S7- Stack: dishes - : LIFO - pop (get first item), push (add item to top)Billa- Queue: queue - : FIFO - add (add last item), remove (delete first item)
District 4: LinkedList, Stack and Queue (Belvedere)
Belevedere- LinkedList: represents sequence of nodes (singly or doubly linked)Russian Monument- class Node: val, next;French Embassy- class Node runner technique: Have two runners, one much faster
District 5: Math & Logic (Paul’s home)
Paul's old flat- Prime numbers: iterate from 2 to sqrt(n)Pilgramgasse- Probability of A&B: P(A and B) = P(B given A) * P(A) (Buenos Aires Airport)Unemployment Center- Probability of A or B: P(A) + P(B) - : P (A and B)Naschmarkt Foods- Independence: one happening tells you nothing about the otherNaschmarkt Restaurants- Mutual exclusivity: if one happens, the other cannot happen
District 6: Recursion (Chili & Tofu)
Chili & Tofu- Recursion: solutions are build off smaller solutionsKaro flat up- Bottom-up: solve problems for simple caseKaro flat down- Top-down: divide problem for case N into subproblemsGumpendorfer Stop- Half-half: divide data in half, binary searchAIDS Hilfe- Dynamic programming: cache repeated calls from recursive algos
District 7: Trees, Heaps and Graphs (Ring)
Ring- Tree - data structure composed of nodesVolkstheater- Every tree has a root nodeStiftsgasse- With zero or more child nodesSiebenstern- Tree cannot contain cyclesNeubaugasse- A binary tree’s nodes have each up to two children, ternary trees have threeZieglergase- A node is a leaf if it has no children-
Kaiserstrasse- Binary search tree: has all left descendants <= n < all right descendants Felzl- Balanched vs. unbalanced: red-black vs AVL trees.Billa- Complete binary tree: every level (except perhaps last) is fully filledDenn's- Full binary tree: no nodes have only one child (0 or 2)-
Prosi- Perfect binary tree: full & complete Burggasse: Traversal: in-order : left > current node > right branch-> Heiligenstadtpre-order: current node before children-> Schoenbrunnpost-order: root node is last visited
Polizei: Min-Heap: complete binary tree & each node is smaller than its children -> root is minimum55/3: Graphs: Collection of nodes with edgesBell: Directed or undirectedEntrance: Acyclic graph has no circleDoor (to trashes): Adjacency list: every vertex (node) stores its adjacent verticesTrash: Adjacency matrix: NxN bool matrixSteps: Graph search: DFS: start at root, and explore each node completely > visit every nodePostal boxes: BFS: start at root, and explore each neighbor completely > find shortest pathBell 2: Bidirectional search: using to BFS to find the shortest path
District 8: Testing (Fethi)
Fethi- Interviewer looks for:Felzl- Test cases: Normal, extreme, null/illegal, strangeHummel Cafe- Big picture: Amazon’s priority is not to charge twiceHammerling-Park- Piece integration: Google sheets integration with mailVolkskundemuseum- Organization: Break into categories-
Altes AKH- Practicality: Create a testing plan Tunnel- Real world object - : Paperclip:- Who will use it? Teacher
- What are the use cases? exams of students
- What are the usage bands? temperature, 30 pages
- Stress / failure condition? 60 + pages
-
How to perform testing? Can’t sit 5 years
Polizei Fuhrmann- Software - : Parental Control:- Who will use it? Children, parents
- What are the use cases? blocking porn, gambling
- What are the usage bands? Block more
- Stress / failure condition? Child has access
-
How to perform testing? Black or Whitebox, break into components
Theater Josefstadt- Troubleshooting:- Understand scenario: how long, which version, consistently, error report
- Problem breakdown: break into testable steps
- Create specific tests: think of walking a customer through
District 10: Arrays & Strings (Train Station)
Bahnhof- Hash tables: key mod 13Erste Bank- Array list: size is defined upon creationTennis Arsenal- Stringbuilder: creates resizable array of small strings
District 13: O-Notation (Grig’s flat)
Elevator- Runtime efforts: logN: binary searchDoor- NlogNLiving room- NKitchen- N^2Toilet- 2^N, recursion if f(n-1) + f(n-2)Children- O(branches ^ depth)HallwayCost: big O (upper limit)Grigs room- big Omega (lower limit)Bathroom- big Theta (bands)Police in 13th- Rules: Drop constant: O(2N) > O(N)Billa- Drop non-dominant terms: O(N^2 + N) > O(N^2)Spar- Multipart algos: Add runtimes: for a in a, for b in b > O(A+B) > O(N)Kebab- Multiply runtimes: for a (for b in b) in a > O(A * B) > O(N^2)
District 14: Pull Requests (Haralds Home)
Entrance- Fork projectKitchen- Clone locallyEating- Create a branchCouch- Make changesGrass- Push back to repoOffice- Click ‘compare & pull request’
District 19: Threads & Locks
U4 Heiligenstadt- Threads: in a process share same memory spaceBakery- Sync: restrict access to shared resourcesBilla- Locks: sync access to data by associating resources with a lockRBI- Deadlock: one threat waits for object lock to be released while others do the sameRSG Main- conditions: mutual exclusion: there is limited access to one resourcePosch New- hold & wait: processing holding resources can request morePosch Old- no preemption: one process cannot function without the otherMensa- circular wait: two ore more processes form a circular chain
Additional Concepts
- Difference between RDD and DataFrame
-
RDD (resilient distributed data set): each record is an independent entity/object row-wise -
Data frame (DF): perform transformations on columns column-wise
-
- Spark:
- Analytics factory: gains capacity by scaling horizontally
- Instructed by a master, executors perform the work
- Cluster manager plans the capacity when receiving the driver programme
- Lazy evaluation: Spark only works when it needs to report results .show()